“In the Studio” this week welcomes a former economist who worked for his country’s treasury department and reserve bank, a former intern with The Economist Group, and, by way of his very unique company bio page, a former windsurfer and kiteboarder who is now settled in San Francisco and the founder and CEO of one of the most interesting data companies I’ve come across.
Anthony Goldbloom, founder and CEO of Kaggle, had the foresight years ago, while he was in his home country of Australia, to build a community of data scientists from around the world in one place online. At the time, he anticipated a world in which data science applied to the web would be overrun with hollow credentials and, instead, sought to build a new reputation layer with this new community. His insight was that data science would certainly be important, but instead of potential clients hiring these scientists full-time, why not open up data sets for them to openly compete against, and let the best algorithms win.
For me, the most interesting aspect of Kaggle is that it is really an online marketplace, which just happens to be a marketplace that pairs data wizards with companies that have interesting data and want to become better at predictive marketing, pricing, or modeling. At Kaggle, these are called “competitions,” where client companies release data sets and the Kaggle community, spread across the world, compete against each other to come up with the algorithm that produces the best result. In this model, Kaggle’s revenues come from larger companies (such as insurance or banking, for instance) that don’t want to build big data-science teams in-house and just want to get to the best result, and fast.
In this discussion, Goldbloom and I talk about how he started building this community online from Australia, how the data world thinks about reputation in a time when anyone can give themselves any title, how much of Kaggle’s community is more motivated by competition than working in-house for a larger company, and how Kaggle’s marketplace approach with temporal competitions provides value without a heavy commitment to his larger corporate clients.
Kaggle is a platform for predictive modeling and analytics competitions. Companies and researchers post their data. Statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective. How a Kaggle competition works: The competition host prepares the data and...