Endangered Languages Project: Google Wants To Save 3,000 Languages Under Threat (Thanks Partly To Google)

Google, along with other major forces of our dot-com times, has played a huge part in making English the lingua franca of the Internet, but today brings news of an effort it’s making to counterbalance that, at least a little. It is kicking off the Endangered Languages Project, a site for interested groups and individuals to share research and collaborate on projects to help preserve languages that are under threat in the modern age, with the aim to document some 3,000 languages — half of all the world’s languages — “on the verge of extinction,” Google’s Clara Rivera Rodriguez and Jason Rissman write in a blog post today.

Google says the new site will contain a number of tools and resources to help keep some of those alive: there will be high-quality recordings of people speaking the languages, copies of historical manuscripts, e-learning options, and even niche-language social networking opportunities, in addition to research and other documentation.

Google is behind the development and launch (read: funding) of the new site, but it says that over the next few months it plans to hand over management and further development to a group of non-profits and academics that specialize in the area of language preservation.

The First Peoples’ Cultural Council (FPCC) will lead on strategy and outreach; and the Institute for Language Information and Technology (the LINGUIST List) at Eastern Michigan University will be the technical lead. Google will continue to have a place on the advisory committee for the site. The site will use data from the Catalog of Endangered Languages, compiled by researchers from the University of Hawai’i at Manoa and Eastern Michigan University.

It remains to be seen whether this is just a bit of nice PR that will mainly benefit collaboration between obscure-language academics, or whether it will have wider-ranging implications. At least in these early stages, it will appear like the former. Also: what exactly constitutes a language that is endangered? “Does it include #sanskrit?” one person asked me via Twitter. The site, for its part, details four levels of language-crisis, which users can find via an interactive map on the site: “at risk,” “endangered,” “severely endangered,” and the ominous-sounding “vitality unknown.”

If the site’s leaders can think of ways of effectively using this project to reach the right targets, there are many advantages to preserving disappearing languages: language has a direct relation to cultural identity, history and by consequence, the future. Technology has a lot to answer for when it comes to certain languages gaining supremacy over others, and so it’s good to see an effort where technology is being used to claw at least some of that dominance back.

It’s also not the first time Google has made an effort to do something cool to bring in the use of uncommon languages to the Internet. Last year, it added Cherokee to its list of languages supported in search. You can see how that looks here.

[youtube http://www.youtube.com/watch?v=Bn2QbwcjmOI&w=560&h=315]