MIT’s Speech Recognition Baby
The Massachusetts Institute of Technology (MIT) may be on the verge of a revolutionary development in speech and video algorithmic technology. Their test subject: a 9 month-old baby boy, who is the center of a project called “The Human Speechome Project.”
Associate Professor Deb Roy, head of the MIT Media Lab’s Cognitive Machines research group, has wired his home with 11 overhead, omni-directional fisheye video cameras and 14 ceiling-mounted microphones. In the basement, a 5-terabyte disk cache holds recorded home activity until Roy trucks it over to the Media Lab for analysis. Bell Microproducts, Seagate Technology, Marvell and Zetera donated the petabyte (1 million gigabyte) storage system.
This hyper-sophisticated surveillance equipment will record all Roy’s son’s 400,000 waking hours for three years, with the objective of discovering how humans naturally acquire language in social settings. Roy will be observing, constantly, the physical and social surroundings that lead a baby from “Mama” and other early words to more complex grammatical constructions.
“Just as the Human Genome Project illuminates the innate genetic code that shapes us, the Speechome project is an important first step toward creating a map of how the environment shapes human development and learning,” said Frank Moss, director of the Media Lab.
The Roys are recording data to the tune of 300 gigabytes per day of compressed data, or an average 12-14 hours a day. In case Dad has to streak through the house to answer the phone, each room has controls to flip off video or audio recording, and there is even an “oops button” to erase accidental moments.
Roy’s team will be developing machine learning systems with a variety of speech and video processing algorithms to test hypotheses of how children learn and to make sense of behavioral and communication patterns embedded in the data collected.
They hope to, though analysis, to expose basic movement patterns with the home (e.g., a person moving from room to room), as well as more complex behaviors (e.g., changing a diaper or putting away dishes). MIT says the effort is one of the most extensive scientific analyses of long-term infant learning patterns ever undertaken.
It’s interesting enough what may come of what we learn about human speech development, but Moss informs us that the research could have a wide impact on other technological realms.
“Equally exciting are the ‘spinoff’ opportunities that could result from this research. The innovative tools that are being developed for storing and mining thousands of terabytes of speech and video data offer enormous potential for breaking open new business opportunities for a broad range of industries — from security to Internet commerce,” Moss said.
The project received seed funding from the National Science Foundation.