The massive power of GitHub is in the startup’s ability to not only act as a code-sharing and publishing service, but also as a versioning platform. Exversion is launching today at Disrupt Europe in Berlin with the hope of being the GitHub for large-scale data sets. The platform aims to aggregate thousands of open datasets and provide collaboration tools to the data professionals that interact with these sets.
Exversion has three main functions. The startup crawls and indexes the world’s open data, and makes it easily searchable and accessible in a single API. Additionally, the startup provides collaborative data-management tools and also serves as a data-publishing platform for enterprise, non-profit, government, and academia, with a deep focus on the developer market.
Similar to GitHub, Exversion allows users to upload data sets from Excel and share versions via “forking,” which allows you to copy from one user’s account to another. This enables you to take a data set that you don’t have write access to and modify it under your own account. Exversion allows professionals to create and manage different versions of the same file. Users can rate, review, and comment on different versions.
Co-founder Jacek Grebski explains that being able to find and access improved versions of previously unusable datasets, data professionals can save one another hours of preventable grunt work. What differentiates Exversion, he says, is that the startup wants to aggregate publicly available data, distribute proprietary data and make it all easily machine readable and shareable.
While some have attempted to aggregate public data from across the web, no one has addressed the lack of quality to many of these datasets, he says. Data professionals can spend hours cleaning up datasets just to make them usable, but they don’t have many options when it comes to sharing these sets to help anyone seeking that data in the future. With Exversion, anyone can access revised data sets (if they are kept public) from their fellow professionals who have already worked with it.
In terms of storage, a basic repository starts with 100MB of storage and scales up to 50GB of data storage. The startup says that this amount of storage encompasses all of NYC’s open data, with some space still left over.
Users can also upload data to a private repository and access, update and share this data with colleagues. With the collaboration features, teams can access, use, update, version control, and track changes for each data set. The startup explains that developers are also able to build applications using Exversion’s data sets and mash this data up with open data sets of their choosing. When a user uploads data, the startup converts this to an API.
Exversion’s social collaboration tools for open data are free to use, and there is no cost for downloading any versions of the datasets. API users are charged per API call, with tiers for different levels of usage.
In terms of the competition, Socrata is in the space, but focuses on opening up data from governments. Other competitors include DataMarket, Enigma.io, and CKAN. To date, Exversion, which is based in New York, has not raised any outside funding.
How do you make sure the data quality remains high?
A: Our users are watching data sets to make sure they are quality.
Have you managed a crowd project?
A: We both come from the data world.
I see how this will be useful in Berlin in the public sectors, but I don’t see why a company would use this?