Microsoft's Project Oxford Gives Developers Access To Facial, Image And Speech-Recognition APIs

Microsoft quietly launched a set of new machine-learning APIs in beta under the “Project Oxford” moniker yesterday and today it’s How-Old.net demo for this service went viral.

This site lets you upload photos of faces and then it automatically figures out how old the person in that photo is. It’s a cool demo — and works reasonably well (though as expected it makes its fair number of mistakes). It’s best to take its results with a grain of salt, but while Microsoft’s demo is interesting, the use case for this is probably more along the lines of trying to figure out whether an images features a child or adult, for example.

How-Old uses some — but not all — of the new developer services that are part of “Project Oxford.”

The new APIs also allow developers to add face detection and recognition features to their apps. By default, the service will try to figure out what the user’s age is and give that information to developers.

As Ryan Galgon, a senior program manager on the Oxford project at Microsoft Technology and Research told me at Microsoft’s Build developer conference today, Oxford and the age-detection project was the result of a major collaboration between different groups inside of Microsoft. Much of what’s available through the service today is based on modern deep learning techniques the company worked on over the last few years.

SimilarFaceSearching Right out of the box, this API also offers face detection in images, face verification to check whether two faces belong to the same person, and the ability to find similar-looking faces.

Other tools include speech recognition and over time, the service will be able to help developers understand their user’s intent. The project also features a vision API for automatically categorizing images and creating smart image crops that always put the subject into the center of the cropped images.

These three services are now available as a public beta. There’s also a fourth API that lets developers build custom language understanding into their applications.

Previously, Microsoft offered a set of somewhat similar APIs under the Bing brand. Bing offers a speech and translator API, for example, but for the most part, these Bing services are somewhat more basic and search-focused than the Project Oxford tools. These Bing APIs, Galgon told me, were more focused on the Windows desktop experience, though. The Project Oxford tools, on the other hand, are available as RESTful APIs (with a limit of 5,000 calls per month).

The Speech API, as the name implies, offers speech-recognition services for speech-to-text conversion, as well as a text-to-speech service that turns written text into audio. More interestingly, though, it also features intent recognition. The idea here is to allow application to understand the speaker’s intent (order a burrito, cancel a flight, etc.). This is driven by the project’s Language Understanding Intelligent Service.

Using the image API, developers can categorize images to filter out adult content, for example, or to simply automatically apply tags to images or group them into clusters. The API also features optical character recognition capabilities and lets developers crop images automatically by recognizing what’s important in an image and keeping that in the center of the photo as you crop it.

For now, the service is available for free. It’s unclear when Microsoft plans to charge for access, but Galgon told me that the company is committed to evolving these services over time.

Even if you’re not a developer, you can give some of these features a try here.