Editor’s note: Yvo Schaap is a 27-year-old entrepreneur from Amsterdam who loves data and code. He’s founder of Directlyrics.com and Fanity.com and has been featured on TechCrunch regarding major security holes or new Google and Facebook products. Follow him on Twitter @yvoschaap.
TechCrunch has always been the most authoritative news outlet on what’s hot in Silicon Valley regarding startup trends, early and late investments, product launches, celeb founders and, of course, geeky drama. Although some frequently praised startups were eventually exposed as fads (Badgeville, Groupon), others were expertly picked early on (Twitter, Airbnb).
It’s again time for end-of-year recaps, and I’ve developed research focused on analyzing TechCrunch’s editorial posts, with an aim to expose this year’s trends in tech. And while I was at it, I even went all the way back into the archives from when TechCrunch was launched.
I analyzed all 106,664 posts made from January 2006 onwards, looking for interesting data hidden in the individual posts. Quickly my focus moved toward mentions — to be more precise, mentions of firms and people throughout time that could imply certain trends. My argument to focus on mentions is that a startup being covered by TechCrunch has a certain value due to its exposure to tech-savvy users, investors and other entrepreneurs. Getting your startup covered is seen as an achievement, hence mentions are a valuable key indicator. And what holds value for startups also holds up for buzzwords, people, brands, products and even old dinosaur tech firms.
6 Fictitious Questions Finally Answered
Quantitative data itself is a bit static, so I hereby present my findings as a fun Q&A:
1. Google, Apple or Microsoft?
Obviously Apple has been releasing a lot of products the past year. But also Google (with Android) and, more recently, Microsoft’s Windows 8 launch gave TC’s writers lots to write about. Can my data expose whether TC’s writers are Apple fan boys? I’ve grouped every company’s major products together to find out.
My results show an ongoing attention fight between Google and Apple, where over the past year Google was mentioned 12 percent more often (1626x) compared to Apple (1455x). Microsoft came only close to reaching them in the month of October, when they launched Windows 8 and their Surface tablet, but in total they were mentioned only 48 percent of the time compared to Apple mentions.
The big three tech firms competing for attention in TechCrunch’s archives.
Zooming in on operating systems only, Android (482x) got significantly more mentions than iOS (338x). But when I join the actual product names iPhone, iPad and Nexus, Galaxy, and Wildfire together, Apple products get almost double the mentions (1184x vs. 681x).
In the tablet war, the iPad also wins, followed by Nexus, Kindle, Note and, as last, the Nook.
2. What is the most mentioned startup of 2012?
While Google and Apple stuff is interesting, startups create the future.
Thanks to the acquisition of Instagram by Facebook, Instagram became the most talked about startup (peak: 21x in a month). Pinterest peaked at the beginning of 2012 (120x) but declined slowly. The only business that grew most in mentions compared to 2011 (72x) is Dropbox (111x).
If we compare the mentions of startups vs. incumbents, incumbents get 290 percent more coverage on TechCrunch.
3. What are 2012’s tech fads?
Exposing tech fads is the most fun to dissect. The past year has been the year of the app. In 2007, apps were mentioned 6x a month, while now peaking at 149x a month. “Realtime” peaked in 2009, while it was almost forgotten in 2012. “Big data” holds strong, together with “cloud.” But the buzzword Hackathon looks like the biggest grower of 2012.
Looking at tech categories, mobile (1042x) easily beats social (682x) followed by enterprise (288x), which did the biggest jump compared to 2011. Sustainable, green, and tech are following a strong downward slope.
4. What VC firms get the most TechCrunch love?
VC firms use their affluence to get coverage of the startups they invest in. So looking at the mentions they can attract, it should indicate something about how much extra value their investments can add regarding coverage.
From my results it’s obvious that in 2012 Google Ventures (82x) was able to attract most attention, followed by Kleiner Perkins (38), Andreessen Horowitz (38x), Sequoia Capital (37x) and Accel (37x). Looking at it from 2006 onwards Google Ventures (180x) and Sequoia Capital (137x) have been the most prominent VC mentions.
5. What incubator generates the most coverage?
Getting your startup noticed in the earliest phases of existence can be vital for getting early feedback and introductions to partnerships. Incubators help get startups off the ground with some cash and advice but also help them get noticed. So how do they perform?
Y Combinator looks like the best bet with 69 mentions for its ~40 2012 startups, followed by 500 Startups (54x). There is little mention of non-U.S. incubators, which is an overall trend on TechCrunch (although they have editors based in Europe).
I also looked into what the best topic is to get coverage. Comparing newsworthy keywords shows launch (1642x) and IPO (346x) as receiving the most mentions. No surprise there. The “Deadpool” was most popular in 2007, but only mentioned 4x in 2012. A site being “down,” or a funding round will get you some coverage. Fun facts: a “rumor” would get you noticed in 2008 and 2009 (644x), but in 2012 not so much (48x). Press releases get 1x mention a year.
6. Who are the most popular tech founders/CEOs?
Surprisingly, 2012 was the year of Google’s Eric Schmidt (26x). Compared to 2011, when Steve Jobs was most mentioned (53x), Apple’s current CEO Tim Cook only gets 18 mentions, almost the same as ex-Twitter founder Jack Dorsey (17x). The only female in the group, Marissa Mayer, gets an award for most CEO mentions (18x) in a single month in all of TechCrunch’s history.
Of course many more questions could be answered; if you have suggestions drop them in the comments and I’ll find out.
Being a good netizen I respected the robots.txt directive when crawling TC’s data. The researched mentions are only extracted from the post title and post snippet. The argument is, if it isn’t in the first ~100 words, it’s probably irrelevant to the actual post topic. Mentions are extracted by doing both regex text matching, as well as advanced NLP matching by OpenCalais. With all the data available, I built an interface on top that is able to do all the heavy lifting, while outputting pretty charts to easily expose the data needed to answer the questions stated above.
To test whether the data makes sense, I looked at rise and fall entities Myspace, Opera and HTC mentioned in the history of TechCrunch, and they looked spot on.