Google Books Ngram Viewer Gets A Larger Dataset, Now Understands Parts Of Speech

Google’s Ngram Viewer for Google Books, a tool that lets you see how the usage of specific words has increased and decreased over time, just got an update. The Ngram Viewer now draws upon a larger dataset (though Google sadly doesn’t say how large exactly it now is) and got a few new features for more advanced analysis.

Most importantly, though, the Ngram Viewer is now a lot smarter. Last year, Google’s Natural Language Processing group built a system that can reliably identify parts of speech. Thanks to this, the Ngram Viewer now knows how often a given word in its corpus was used as a noun or verb, for example.

As far as new features go, the main new tool in this release is the ability to add, subtract, multiply and divide Ngram counts. This, says Google, allows you to see “how ‘record player’ rose at the expense of ‘Victrola,'” for example.

In addition, the Ngram Viewer team also added support for Italian to the current set of available languages (English, Chinese, Spanish, French, German, Hebrew and Russian).

At first glance, it would seem as if a tool like this would mostly be of interest to historical linguists in academia, but the project has actually been a major mainstream success for Google. As Google Engineering Manager and the project’s co-creator Jon Orwant notes in today’s announcement, he and the rest of the team were surprised by “its popularity among casual users.” Since its launch in 2010, he writes, the Ngram Viewer has been used about 50 times per minute and over 45 million graphs have been created with it.