Microsoft demos next-generation image-captioning Captionbot

The power of the cloud is a bit fuzzy to most of us, but Microsoft wants to improve that by giving developers a series of API tools. The suite, dubbed Cognitive Services, empowers developers to make their software far smarter, including tools for trainable speech-to-text processing and a quality of object recognition verging on actual magic.

Drizzle a bit of API-enabled artificial intelligence on your applications with Microsoft’s new Cognitive Services.

Under the slogan of “Give your apps a human side,” Cognitive Services is a collection of APIs for developers to use in their applications. Two examples demoed at the Build conference include a brand-new object recognition engine, which is likely to replace Project Oxford. To demo what this API can do, Microsoft created Captionbot.ai, which is a tremendously addictive (and science-fiction-grade awesome).

The other API demoed at the conference is custom voice-recognition tools for audio transcription, to be able to recognize low-grade audio. It enables specific use cases where developers might need to train voice recognition beyond its standard levels, while retaining the underlying voice recognition engine. Examples might include transcription of voices with heavy accents, children, speech impediments or audio with a particular type of background noise, such as audio captured at a drive-through with particular types of highway noise, etc.

Of all the things demoed at Build, Cognitive Services was the thing that felt the most like it was nudging us just that little bit further into the future, and I can’t wait to see what developers do with the tech.