Mapillary opens up 25k street-level images to train automotive AI systems

As more companies wade into the business of building artificial intelligence systems to help you drive (or do the driving for you), a startup founded by an ex-Apple computer vision specialist is open sourcing a huge dataset that can help them on their road to autonomy.

Mapillary, a Swedish startup backed by Sequoia, Atomico and others that has built a database of 130 million images through crowdsourcing — think open-source Street View — is releasing a free dataset of 25,000 street-level images from 190 countries, with pixel-level annotations that can be used to train automotive AI systems.

The Mapillary Vistas Dataset claims to be “the world’s largest, most diverse dataset for object recognition on street-level imagery.” As with the rest of Mapillary’s photos, the startup builds its image database on top of Mapbox and OpenStreetMap maps.

The dataset is free for both academic and commercial researchers, and if anyone wants to build the results into commercial products, they must pay a commercial license.

As Jan-Erik Solem, the CEO and co-founder, explained, while there are other datasets that companies are using to train the machine learning algorithms for their in-car systems, these fall short because they “do not have enough variability and coverage to be useful in real-world scenarios.”

This Vistas dataset is built on top of regular Mapillary images, where most of the images come from crowdsourcing. “What we have done here is that we manually selected 25,000 images with the variability we wanted from the 130+ million available on Mapillary,” Solem explained. “Then we manually annotated them to label all the pixels in the images. This is a tedious and expensive manual labor process.”

Expensive, and yet now free to use, because of the companies that are “sponsoring” the work, Solem said.

Sponsors of this dataset are Lyft, Toyota and Daimler, some of whom received pre-release data, he added. It’s not clear exactly how these three companies may be using the datasets, beyond making their own autonomous driving systems smarter and more fail-safe.

“This is our main dataset for training our own algorithms. Our own need was one of the reasons for creating this dataset,” Solem noted.

You can see a visual progression of how ordinary pictures transform into pixel-level annotated data sets in the gif above, and how a final product looks in the image below, ready to plug into your machine learning engine.

Mapillary, it should be pointed out, has yet to reveal much about who its paying customers are these days. “We’re just about to tell the world,” Solem said when I asked him about this, although the three companies sponsoring the Vista dataset are probably good guesses.

Mapillary describes its wider dataset as one that is used to help build smart cities, future maps, and autonomous vehicles. Using computer vision, Mapillary “reads” images that have been uploaded to its database to identify locations in 3D and recognise and order objects within them.

When we wrote about Mapillary’s most recent funding round — $8 million from Sequoia, Atomico, LDV Capital, and PlayFair in March 2016 — we noted that the company had signed up various organizations to use its data. They included the Swedish town of Helsingborg, Los Angeles County, the World Bank and the Red Cross (although, again, whether these are paying or free users is not clear).

The company does have a set of pricing tiers that point to its B2B focus: the database is free for the first 50,000 views of images with no data requests; $250 per month for up to 500,000 views and 250 data requests; and then priced on a case-by-case if you are using more than this.

“As a business we provide images, data automatically extracted from these images, and processing services for clients that have their own imagery but don’t want to share that on public Mapillary,” Solem said. “Our markets are mapping, automotive, and GIS (Geographical Information Systems). We’re in early stages revenue-wise and 2017 will be a very interesting year for us as a business.”

While crowdsourcing can be a tricky and inconsistent way to build a database, it’s notable that Mapillary’s crowdsourcing is something of a closed loop.

Those who use the platform also contribute to Mapillary’s wider database, which means that the system is building stronger datasets exactly in the locations where there is demand at the moment, without filling in the blanks for other places, which will be populated more as and when the need to do so arises — not unlike how Waze was built in its early days, well before getting acquired by Google.

“When it comes to image contributions to Mapillary in general, people are self-motivated and contribute because we help them solve problems they have,” Solem explained. “This can be sharing of places and place data, inventory work, mapping work, and map editing.”

Solem has impressive credentials in the area of computer vision. His previous company — the Malmo, Sweden-based facial recognition startup Polar Rose — was quietly acquired by Apple in 2010. He subsequently joined the iPhone maker to work on computer vision and other projects for several years after that.

Mapillary’s existence is an interesting development in the bigger world of digital mapping services. These are used not just as cornerstones in how smartphones work, but are central to how a lot of the next wave of computing is being shaped. That leads to inevitable questions of who should rightfully own this kind of potentially very central and crucial data.

Interestingly, although Solem put in significant time at Apple — one of potentially only a few big commercial players digital mapping alongside Google and the car consortium that now owns Here (formerly Nokia) — now Solem is singing a different tune when it comes to creating and long-term ownership of mapping datasets.

“I think it is worrisome that in mapping things are being consolidated into a few players,” he told me back when Mapillary raised its last round of funding. “It’s bad because it means that data moves into silos and very little is shared again. When Apple picks up companies and puts their data into Apple Maps, they disappear. A lot of the data that used to be provided is gone. And Apple has no interest in providing that info to anyone else. There are certain things that you should keep independent.”

Independent, and ready for many and any others to use as they will.