The Big Data Bottleneck In The Consumer Web

Semil Shah

I am currently an independent consultant working on mobile, growth, and operations with a small handful of early-stage, venture-backed companies. Previously, I spent six (6) months as an EIR with Javelin Venture Partners, a San Francisco-based venture capital firm investing in software startups for consumers and the enterprise, as well as in cloud technologies and infrastructure. Prior to this,... → Learn More

Monday, November 21st, 2011
Photo Credit / Creative Commons An&

Editor’s note: TechCrunch contributor Semil Shah is an entrepreneur interested in digital media, consumer Internet, and social networks. Shah is based in Palo Alto and you can follow him on twitter @semil

Earlier in the year, I wrote an opinion column on TechCrunch that big data “needs to think bigger.” At the time, I kept hearing the term “big data” over and over, and wondered how much of the emerging insights and techniques would be applied toward the Internet versus the larger problems society faces, such as detecting fraud in financial markets, finding new deposits of natural resources, or helping discover the next big pharma drug.

Yet in some of my experiences monitoring the space since then, I’ve come to conclusion for now that my March 2011 column meant well, but that reality is much further behind than we’d like to think. One would assume, for instance, that big drug companies would be aggressive adopting new, external, cutting-edge techniques to analyze their own data for new insights, especially with a dangerous patent cliff looming in 2012. Turns out, oftentimes drug companies aren’t always willing to share data with third parties, which is often necessary to take advantage of big data infrastructure. While I believe that eventually the best data science will emerge to help these industries grow in new ways, for now at least, the best opportunities lie in the one area I wanted to gloss over last time: the consumer and mobile web.

Investors see the wave coming. Over the past few months, the top-tier funds have begun to make their moves. Benchmark Capital brought in Craig Weissman from Salesforce as an EIR and invested in Josh James’ new company, Domo; Accel Partners recently announced the creation of a “Big Data Fund” by reallocating monies from existing funds, which will improve data dealflow; and of course, there’s Greylock Partners, which was one of the earliest investors in this space through numerous companies and, most recently, by recruiting DJ Patil to be their “Data Scientist in Residence.”

Since March, I’ve continued to hear the term “big data” uttered by so many, yet so few seemed to grasp what it means for us and the web (yours truly, included). We all know that the major social networks (like Facebook), broadcast engines (like Twitter), self-expression tools (like Tumblr and Pinterest), and services (like Dropbox) generate ridiculous amounts of data. Add to this the growing Quantified Self movement, where connected devices from companies like Fitbit, Runkeeper, and Jawbone let us track our offline movements and analyze them online.

What happens, then, when the companies holding these big buckets of data go to cash them in?

In the earlier stages of consumer web companies, data can be used to create new products with the hopes of increasing engagement metrics. Then, as a company begins to mature, services can be built using the data that may ideally involve revenue. In these companies today, data-driven engagement products are oftentimes baked into the earliest versions of the products, such as recommendation engines for whom to follow, where to go, or what to watch.

We should not take data as a given, however. To start with, the FTC has been warning technology executives to collect data core to their business only. One might be shocked at just how many well-funded, recognizable startups haven’t been collecting good, structured data, and in some cases, they don’t collect any. For those that do get a handle on their data, they oftentimes do not possess the talent in-house to make sense of it because the skills required to do so are rare.

The consumer web companies that do interesting things with data are the ones you’d expect: Google, Facebook, Amazon, LinkedIn, and Zynga, among a small group of others. Most web startups don’t have access to the right mathematical and statistical backgrounds needed in order to extract value from the data. Some data scientists I’ve talked to will go so far as to say that consumer startups that start to grow fast need a data scientist as part of the core engineering team as soon as possible, because most engineers working in the consumer space don’t have the skills in statistics and/or machine learning required to make sense of the data. (A data scientist is someone sufficiently trained to ask the proper questions of the data in order to tease out insights that serve as the basis for building new products and that, in turn, generate income for the company).

And, herein lies the rub.

What I’m writing isn’t news. Everyone who watches the space knows it. The reality is that this talent is in short supply. To put it in terms we can understand, for every 100 great iPhone engineers, there may be one or two people who can, on their own, dig into consumer web data and discover and build new and engaging services from it.

It’s been my experience that the majority of those who do, in fact, posses these statistical, mathematical, and machine learning skills are currently busy, diligently applying their rare skills in other industries such as finance, life sciences, and the physical sciences. They oftentimes haven’t applied their techniques on data sets culled from the consumer web, nor are they interested in doing so. As a result, there are very, very, very few people like DJ Patil, Pete Skomoroch (of LinkedIn), or Jeff Hammerbacher (of Cloudera) who truly understand these techniques as they relate to the world wide web. Since we can’t clone them, the alternative has been to build data teams consisting of data specialists and pairing them with those that have extensive consumer web data experience.

So, the next time you hear someone talk about “big data” in the context of the consumer web, realize that, yes, valuable data, whether big or small, is being collected by every click we strike. The big companies with resources are keenly aware of the opportunity, but most web startups don’t have data scientists as part of their early teams, and even if they wanted to, those folks are hard to find. Therefore, it’s my opinion that “big data” is a term we’ll hear for a very long time to come. Data generated by the web will produce some of the largest data sets ever known, if they haven’t already, and somewhere within all those billions and billions of likes, retweets, upvotes, reblogs, and repins may reside truths that, yet again, change the way we live. But more data scientists will be needed to unlock them.

Photo Credit / Creative Commons by An&


Company: Facebook
Website: facebook.com
Launch Date: February 1, 2004
IPO: NASDAQ:FB

Facebook is the world’s largest social network, with over 1 billion monthly active users. Facebook was founded by Mark Zuckerberg in February 2004, initially as an exclusive network for Harvard students. It was a huge hit: in 2 weeks, half of the schools in the Boston area began demanding a Facebook network. Zuckerberg immediately recruited his friends Dustin Moskovitz, Chris Hughes, and Eduardo Saverin to help build Facebook, and within four months, Facebook added 30 more college networks. The original...

→ Learn more
Company: Amazon
Website: amazon.com
Launch Date: 1994
IPO: NASDAQ:AMZN

Amazon.com, Inc. (AMZN), is a leading global Internet company and one of the most trafficked Internet retail destinations worldwide. Amazon is one of the first companies to sell products deep into the long tail by housing them in numerous warehouses and distributing products from many partner companies. Amazon directly sells or acts as a platform for the sale of a broad range of products. These include books, music, videos, consumer electronics, clothing and household products. The majority of Amazon’s...

→ Learn more
Company: Tumblr
Website: tumblr.com
Launch Date: February 2007
Funding: $125M

Tumblr is a re-envisioning of tumblelogging, a subset of blogging that uses quick, mixed-media posts. The service hopes to do for the tumblelog what services like LiveJournal and Blogger did for the blog. The difference is that its extreme simplicity will make luring users a far easier task than acquiring users for traditional weblogging. Anytime a user sees something interesting online, they can click a quick “Share on Tumblr” bookmarklet that then tumbles the snippet directly. The result is...

→ Learn more
Company: Cloudera
Website: cloudera.com
Launch Date: October 13, 2008
Funding: $141M

Cloudera, the commercial Hadoop company, develops and distributes Hadoop, the open source software that powers the data processing engines of the world’s largest and most popular web sites. Founded by leading experts on big data from Facebook, Google, Oracle and Yahoo, Cloudera’s mission is to bring the power of Hadoop, MapReduce, and distributed storage to companies of all sizes in the enterprise, Internet and government sectors. Headquartered in Silicon Valley, Cloudera has financial backing from Accel Partners, Greylock Partners...

→ Learn more
Company: LinkedIn
Website: linkedin.com
Launch Date: May 1, 2003
IPO: NYSE:LNKD

With over 100 million users representing over 200 countries around the world, LinkedIn is a fast-growing professional networking site that allows members to create business contacts, search for jobs, and find potential clients. Individuals have the ability to create their own professional profile that can be viewed by others in their network, and also view the profiles of their own contacts. Competitors to LinkedIn include sites such as XING, Doostang and Ecademy. Of note, LinkedIn won...

→ Learn more
Company: Google
Website: google.com
Launch Date: September 7, 1998
IPO: NASDAQ:GOOG

Google provides search and advertising services, which together aim to organize and monetize the world’s information. In addition to its dominant search engine, it offers a plethora of online tools and platforms including: Gmail, Maps, YouTube, and Google+, the company’s extension into the social space. Most of its Web-based products are free, funded by Google’s highly integrated online advertising platforms AdWords and AdSense. Google promotes the idea that advertising should be highly targeted and relevant to users thus providing...

→ Learn more
Company: Zynga
Website: zynga.com
Launch Date: July 2007
IPO: NASDAQ:ZNGA

Zynga was founded in July 2007 by Mark Pincus and is named for his late American Bulldog, Zinga. Loyal and spirited, Zinga’s name is a nod to a legendary African warrior queen. The early supporting founding team included Eric Schiermeyer, Michael Luxton, Justin Waldron, Kyle Stewart, Scott Dale, John Doerr, Steve Schoettler, Kevin Hagan, and Andrew Trader. Zynga’s mission is connecting the world through games. Everyday millions of people interact with their friends and express their unique personalities through our...

→ Learn more
Financial-organization: Accel Partners
Website: accel.com
Launch Date: 1983

Accel Partners is a global venture capital firm with offices located in Silicon Valley, New York, London, China, and India. They typically make multi-stage investments in internet technology companies. Founded in 1983, Accel Partners has a long history of excellence and innovation in the venture capital business and is dedicated to partnering with outstanding entrepreneurs and management teams to build world-class companies. Accel today invests globally using dedicated teams and market-specific strategies for local geographies, with offices in Palo...

→ Learn more
Tags: