“I believe CrunchBase will gain a lot of attention from the academia soon, which is always eager for high-quality data set,” writes Guang Xiang of Carnegie Mellon University, who found that he could predict Mergers and Acquisitions much better using the unique business variables available in CrunchBase than the traditional databases used by academics. Thanks, Xiang, flattery will get you everywhere.
“Traditionally, people only used numeric variables/features for M&A prediction, such as ROI, etc. CrunchBase and TechCrunch provided a much richer corpus for the task,” he writes. Specifically, CrunchBase gave him data on a volume of companies roughly 43 times the normal dataset (2300 vs. +100,000) and access to valuable variables, such as management structure, financing, and media coverage.
For instance, “Strong financial backing is generally considered critical to the success of a company,” but traditional datasets won’t have detailed information on the management, their experience, and the funding rounds.
Even better, the news coverage itself on Techcrunch could also be a predictor of merger or acquisition (because, well, duh, if a company’s doing well enough to make the news, there’s a good chance someone is also itching to buy it out).
But, just when we were starting to blush, Xiang brought out the criticism, “Despite its large magnitude, the CrunchBase corpus is sparse with many missing attributes,” because the community-created database tends to focus on more popular companies and features. That said, even with drawbacks, the researchers still achieved “good performance,” with CrunchBase — Which impressively enough has been managed all these years by superwoman Gene Teare.
M&A activity is just the tip of the iceberg, and there are all sorts of business questions that could be answered using the vast amounts of data provided by CrunchBase. So, statisticians and business analysts, go nuts. And, when you find something cool, let us know first (firstname.lastname@example.org).