There’s a sea of interesting public data out there just waiting to be tapped into, but there’s a problem — most people have no earthly idea how to access it. And even if they’re able to make some headway, there’s still an untold number of connections between that data and even more data tucked away in another silo.
That’s where New York startup Enigma comes into play — founded by Hicham Oudghiri and Marc Dacosta and helmed by CEO Jeremy Bronfmann, Enigma taps into over 100,000 public data sources from state and federal records to SEC filings to lists of frozen assets in the United Kingdom all the way to Crunchbase. The end result is an incredibly simple, incredibly smart way to sift through and find connections in publicly available data, and the team has just launched it on our Disrupt stage in New York.
Here’s a quick example: if you punch “Boeing” into the Enigma search box for you’ll be greeted by a handsome high-level view of multiple tables — think FCC licenses, air carrier financials, government spending contracts, lists of company subsidiaries, the works. Those tables can be previewed by hovering over them, and a single click opens the full table for your perusal. From there you could click on a particular plane’s tail number to figure out what carrier operates it and what (if any) accidents it’s been involved in.
But let’s try an example that may hit just a little closer to home. Searching for “Google” brings up yet another extensive list of datasets to dive into, including a listing of the company’s 2012 H-1B Visa applications. After jumping into that set, you can run another search to drill down those Visa applications to view only software engineers and determine that the U.S. Department of Labor certified 471 of Google’s H-1B applications for software engineers, and that the average wage was about $122,809.
That’s all very neat, but how does Enigma do it? The data itself comes from a host of places, but most of Enigma’s government data was obtained by issuing a Freedom of Information Act request to the U.S. General Services Administration for all the top level .gov domains. From there the team uses crawlers to download all the databases it can find, and algorithmically finds connections between all those data points to create a sort of public knowledge graph. Whenever you search for a term on Enigma, Enigma actually searches around that term to figure out and display whatever applicable data sets it can find.
Enigma shines a light on a simply staggering amount of information, and it certainly doesn’t hurt that the web app was very thoughtfully designed by Raphaël Guilleminot. As impressive (and as pretty) as the web app is, it’s only part of Enigma’s formula. The team looks at Enigma as an infrastructure play as well — it’s their hope that the ability to help users intelligently sift through all this public data will become just another layer of the internet in the next five years. A lofty goal, certainly, but Enigma has managed to strike deals with some very prominent partners.
You’d be right to think that journalists, academics, and finance types would find this sort of broad approach to surfacing hard-to-find data valuable and the New York startup has locked up partnerships with four startlingly divergent organizations that relate to those different fields. Enigma is launching with partnerships with Harvard Business School, research firm Gerson Lehrman Group, S&P Capital IQ, and the New York Times (which just recently signed on as a strategic investor too). As these partnerships sort of indicate, Enigma in its current form is mostly an enterprise play (though any data aficionado can sign up for a free trial here), but the team hasn’t discounted the possibility of a free or freemium version for data buffs to tinker with in the future.
Questions & Answers
Q: Business model?
On the API side Enigma partnered with S&P Capital IQ — feeds them data to build highly verticalized tools. They also sell subscriptions to the web app, which adds feature a bunch of premium tools on top of the base experience for paid users.
Q: Does Enigma use every public dataset? Or just select ones?
A: Enigma uses a hybrid approach: got data from .gov domains first and used heuristics to find the good data. The second half is to go “door to door” to different agencies to get data that isn’t online.
Q: How big is the market?
A: The market for financial data/services is about $18 billion, and 35% of that is corporate strategy and academics. Enigma will add on more markets down the road, and hopes companies like Zillow can build companies using that data.
Q: Are there are any regulatory issues you face?
A: Everything in Enigma is public, but there are limitations. In philly for instance you can’t list property data with names, so Enigma plays by the rules.