Microsoft announces breakthrough in Chinese-to-English machine translation

A team of Microsoft researchers announced on Wednesday they’ve created the first machine translation system that’s capable of translating news articles from Chinese to English with the same accuracy as a person. The company says it’s tested the system repeatedly on a sample of around 2,000 sentences from various online newspapers, comparing the result to a person’s translation in the process – and even hiring outside bilingual language consultants to further verify the machine’s accuracy.

The sample set, called newstest2017, was released just last fall at the research conference WMT17.

It’s surprising, then, how quickly the researchers were able to achieve this milestone – especially given that machine translation is a problem people have been trying to solve for decades.

Many have even believed that the goal of human parity would never be realized, Microsoft notes.

“Hitting human parity in a machine translation task is a dream that all of us have had,” said Xuedong Huang, a technical fellow in charge of Microsoft’s speech, natural language and machine translation efforts, in Microsoft’s blog post. “We just didn’t realize we’d be able to hit it so soon.”

Getting a machine to understand language at this scale is far more complicated than speech recognition – something that’s seen a number of advances in recent years. Advances in A.I. and speech recognition have allowed voice assistants to find their way onto our smartphones and in our homes where help consumers with everyday computing tasks, controlling smart home devices, and for news and entertainment purposes.

But asking for a machine translation of a web page or news article still often renders the same hard-to-understand mess of words that, at best, gives you a general idea about what’s being said, but is nearly impossible to grasp with any deep comprehension.

To really understand what’s being said in longer articles, you’d need a person’s help.

But even different human translators may translate a sentence in a slightly different way, with neither being wrong.

“Machine translation is much more complex than a pure pattern recognition task,” said Ming Zhou, assistant managing director of Microsoft Research Asia and head of a natural language processing group that worked on the project.  “People can use different words to express the exact same thing, but you cannot necessarily say which one is better.”

Recent breakthroughs in A.I. contributed to researchers achieving this milestone, Microsoft also notes.

Deep neural networks, a method of training A.I. systems, allowed the researchers to create more fluent and natural-sounding translations that take into account broader context that the prior approaches, called statistical machine translation.

Microsoft’s researchers also added their own training methods to the system to improve its accuracy – things they equate to how people go over their own work time and again to make sure it’s right.

The researchers said they used methods including dual learning for fact-checking translations; deliberation networks, to repeat translations and refine them; and new techniques like joint training, to iteratively boost English-to-Chinese and Chinese-to-English translation systems; and agreement regularization, which can generate translations by reading sentences both left-to-right and right-to-left.

Zhou said the techniques used to achieve the milestone won’t be limited to machine translations.

“This is an area where machine translation research can apply to the whole field of AI research,” Zhou said.

In addition, it could enable more accurate and natural translations across other languages in the future.

The researchers caution the system has not yet been tested on real-time news stories, and there are other challenges that still lie ahead before the technology could be commercialized into Microsoft’s products.

But you can play around with the new translation system here on Microsoft’s website: (This is not the production system, and may run slower at times, the site warns.)

The system will show a sentence in Chinese (simplified), which is then translated two ways, with the more perfected translation on the right to demonstrate the improvements.

Machine translation is something researchers at Google have been working on as well, including with its own machine learning technique for Chinese-English queries that also uses neural nets. These advances has already been put to work to improve Google’s consumer-facing products, like Google Translate’s app, and its integrations in Google search.

Image, top: Xuedong Huang, technical fellow in charge of Microsoft’s speech, natural language and machine translation efforts. (Photo by Scott Eklund/Red Box Pictures)