Historic Breakthrough: Microsoft Reaches Virtual Parity With Human Speech

Rich OrdTechnology1 Comment

Share this Post


In an historic breakthrough, Microsoft's AI team has developed technology that recognizes speech as well as humans. Their research team published a paper (PDF) showing that their speech recognition system makes errors at the same rate as a professional transcriptionists, which is 5.9%.

The IBM Watson research team published a word error rate (WER) of 6.9% earlier this year. They noted that their previous WER was 8%, announced in May 2015 and that was 36% better than previously reported external results.

Clearly, artificial intelligence technology is on a pace that will make machine word recognition superior to human word recognition in just a matter of months. Of course WER is only one way to measure and the technology must continue to improve for perfect comprehension and to prompt human level responses.

Microsoft, IBM, Apple, Google, Amazon and a host of other companies are on a mission to use AI to integrate speech recognition technology into virtually every device. In order to truly make the IoT meaningful to people, we will need to be able to communicate with them in our language. By 2020, there will be over 30 billion things connected to the internet, according to Cloudera.

"We’ve reached human parity," said Xuedong Huang, who leads Microsoft's Microsoft's Advanced Technology Group and is considered their chief speech scientist. "This is an historic achievement."

Microsoft says that the milestone will have broad implications for consumer and business products including consumer devices like Xbox and personal digital assistants such as Cortana.

"This will make Cortana more powerful, making a truly intelligent assistant possible," notes Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group. "Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible."

"The next frontier is to move from recognition to understanding," said Geoffrey Zweig, who manages the Speech & Dialog research group.

The holy grail according to Shum is "moving away from a world where people must understand computers to a world in which computers must understand us."

At the rate the technology is advancing, that goal now seems within reach.

Rich Ord
  • David A. Crabill

    Wow! We are getting more and more spoiled, which is ok with me. Love the technology.