Thomson Reuters’ OpenCalais, a service and open API that lets users to incorporate semantic tagging in blogs, content management systems, or website content, has been upgraded to include social tagging, integration for Spanish content, and improved linked data depth for companies. OpenCalais’ technology is powered by text analytics company Clear Forest, which was acquired by Reuters in 2007. OpenCalais, which is free, uses natural language processing, machine learning and other methods to analyze a document and finds the entities within it. CNET and Huffington Post are among the blogs and sites that use OpenCalais.
OpenCalais 4.1 (released today) and 4.2 (to be released in a few weeks) will first include a new social component that will emulate how a human might tag a document. While OpenCalais is a semantic data extraction engine, it doesn’t necessarily take out the kind of tags a human would put on an article. For example, in an article about luxury cars, OpenCalais would be able to pick out BMW and Porsche as tags but wouldn’t necessarily pick out descriptions like “sports cars” or “automobiles.” OpenCalais’s technology will now generate these sort of tags, called “Social Tags,” and will analyze content and map it to a knowledgebase based on Wikipedia and other sites.
The new version will also extract tags from content written in Spanish. OpenCalais previously supported English and French. One of the neat things about OpenCalais is that it lets publishers combine their content with Linked Data assets from Wikipedia, IMDB and other databases. The new version has also upgraded the Linked Data features for company data, linking to new enterprise information sites like our own CrunchBase. And OpenCalais’s semantic entity database has been spruced up to include recession relevant terms such as accounting changes, labor issues, layoffs, earnings restatements, delayed filings and more.
div class=”cbw snap_nopreview”>