CrowdFlower Launches Open Data Project Covering Everything From Climate Change To #ThatDress

Next Story

Six Months After Acquisition, Caviar Has Tripled Order Volume And More Than Doubled Headcount

Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative.

Founder and CEO Lukas Biewald (a friend of mine from college) told me that last year, the company quietly began asking some of its customers if they were willing to make the data they gathered through CrowdFlower public, and now it’s officially launching the initiative with its first batch of data sets.

Biewald said this grew out of his own frustrations about the lack of open data during his time as a grad student and as a scientist at search startup Powerset. His hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)

The company has also changed its pricing to reflect this goal — it won’t charge a licensing fee for customers who are willing to share their data (they still have to pay their contributors to actually gather that data, though).

As for the data that’s available now, well, it’s an interesting peek at how people have been using CrowdFlower. There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black.

As for whether there are any privacy risks in releasing this data (CrowdFlower has been used to analyze some interesting stuff), Biewald said the company is avoiding that by hand-picking the data sets.