Wikipedia’s Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others

Comment

Image Credits:

Wikidata, the first new project to emerge from the Wikimedia Foundation since 2006, is now beginning development. The organization, known best for its user-edited encyclopedia of knowledge Wikipedia, recently announced the new project at February’s Semantic Tech & Business Conference in Berlin, describing Wikidata as new effort to provide a database of knowledge that can be read and edited by humans and machines alike.

There have been other attempts at creating a semantic database built from Wikipedia’s data before – for example, DBpedia, a community effort to extract structured content from Wikipedia and make it available online. The difference is that, with Wikidata, the data won’t just be made available, it will also be made editable by anyone.

The project’s goal in developing a semantic, machine-readable database doesn’t just help push the web forward, it also helps Wikipedia itself. The data will bring all the localized versions of Wikipedia on par with each other in terms of the basic facts they house. Today, the English, German, French and Dutch versions offer the most coverage, with other languages falling much further behind.

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

The initial effort to create Wikidata is being led by the German chapter of Wikimedia, Wikimedia Deutschland, whose CEO Pavel Richter calls the project “ground-breaking,” and describes it as “the largest technical project ever undertaken by one of the 40 international Wikimedia chapters.” Much of the early experimentation which resulted in the Wikidata concept was done in Germany, which is why it’s serving as the base of operations for the new undertaking.

The German Chapter will perform the initial development involved in the creation of Wikidata, but will later hand over the operation and maintenance to the Wikimedia Foundation when complete. The estimation is that hand-off will occur a year from now, in March 2013.

The overall project will have three phases, the first of which involves creating one Wikidata page for each Wikipedia entry across Wikipedia’s over 280 supported languages. This will provide the online encyclopedia with one common source of structured data that can be used in all articles, no matter which language they’re in. For example, the date of someone’s birth would be recorded and maintained in one place: Wikidata. Phase one will also involve centralizing the links between the different language versions of Wikipedia. This part of the work will be finished by August 2012.

In phase two, editors will be able to add and use data in Wikidata, and this will be available by December 2012. Finally, phase three will allow for the automatic creation of lists and charts based on the data in Wikidata, which can then populate the pages of Wikipedia.

In terms of how Wikidata will impact Wikipedia’s user interface, the plan is for the data to live in the “info boxes” that run down the right-hand side of a Wikipedia page. (For example: those on the right side of NYC’s page). The data will be inputted at data.wikipedia.org, which will then drive the info boxes wherever they appear, across languages, and in other pages that use the same info boxes. However, because the project is just now going into development, some of these details may change.

Below, an early concept for Wikidata:

All the data contained in Wikidata will be published under a free Creative Commons license, which opens it up for use by any number of external applications, including e-government, the sciences and more.

Dr. Denny Vrandečić, who joined Wikimedia from the Karlsruhe Institute of Technology, is leading a team of eight developers to build Wikidata, and is joined by Dr. Markus Krötzsch of the University of Oxford. Krötzsch and Vrandečić, notably, were both co-founders of the Semantic MediaWiki project, which pursued similar goals to that of Wikidata over the past few years.

The initial development of Wikidata is being funded through a donation of 1.3 million Euros, granted in half by the Allen Institute for Artificial Intelligence, an organization established by Microsoft co-founder Paul Allen in 2010. The goal of the Institute is to support long-range research activities that have the potential to accelerate progress in artificial intelligence, which includes web semantics.

“Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world,” says Dr. Mark Greaves, VP of the Allen Institute.

Another quarter of the funding comes from the Gordon and Betty Moore Foundation, through its Science program, and another quarter comes from Google. According to Google’s Director of Open Source, Chris DiBona, Google hopes that Wikidata will make large amounts of structured data available to “all.” (All, meaning, course, to Google itself, too.)

This ties back to all those vague reports of “major changes” coming to Google’s search engine in the coming months, seemingly published far ahead of any actual news (like this), possibly in a bit of a PR push to take the focus off the growing criticism surrounding Google+…or possibly to simply tease the news by educating the public about what the “semantic web” is.

Google, which stated it would be increasing its efforts at providing direct answers to common queries – like those with a specific, factual piece of data – could obviously build greatly on top of something like Wikidata. As it moves further into semantic search, it could provide details about the people, places and things its users search for. It would actually know what things are, whether birth dates, locations, distances, sizes, temperatures, etc., and also how they’re connected to other points of data. Google previously said it expects semantic search changes to impact 10% to 20% of queries. (Google declined to provide any on the record comment regarding its future plans in this area).

Ironically, the results of Wikidata’s efforts may then actually mean fewer Google referrals to Wikipedia pages. Short answers could be provided by Google itself, positioned at the top of the search results. The need to click through to read full Wikipedia articles (or any articles, for that matter) would be reduced, leading Google users to spend more time on Google.

More TechCrunch

All cars suffer when the mercury drops, but electric vehicles suffer more than most as heaters draw more power and batteries charge more slowly as the liquid electrolyte inside thickens.…

Porsche invests in battery startup South 8 to boost cold-weather EV performance

Scale AI has raised a $1 billion Series F round from a slew of big-name institutional and corporate investors including Amazon and Meta.

Data-labeling startup Scale AI raises $1B as valuation doubles to $13.8B

The new coalition, Tech Against Scams, will work together to find ways to fight back against the tools used by scammers and to better educate the public against financial scams.

Meta, Match, Coinbase and others team up to fight online fraud and crypto scams

It’s a wrap: European Union lawmakers have given the final approval to set up the bloc’s flagship, risk-based regulations for artificial intelligence.

EU Council gives final nod to set up risk-based regulations for AI

London-based fintech Vitesse has closed a $93 million Series C round of funding led by investment giant KKR.

Vitesse, a payments and treasury management platform for insurers, raises $93M to fuel US expansion

Zen Educate, an online marketplace that connects schools with teachers, has raised $37 million in a Series B round of funding. The raise comes amid a growing teacher shortage crisis…

Zen Educate raises $37M and acquires Aquinas Education as it tries to address the teacher shortage

“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”

Scarlett Johansson says that OpenAI approached her to use her voice

A new self-driving truck — manufactured by Volvo and loaded with autonomous vehicle tech developed by Aurora Innovation — could be on public highways as early as this summer.  The…

Aurora and Volvo unveil self-driving truck designed for a driverless future

The European venture capital firm raised its fourth fund as fund as climate tech “comes of age.”

ETF Partners raises €285M for climate startups that will be effective quickly — not 20 years down the road

Copilot, Microsoft’s brand of generative AI, will soon be far more deeply integrated into the Windows 11 experience.

Microsoft wants to make Windows an AI operating system, launches Copilot+ PCs

Hello and welcome back to TechCrunch Space. For those who haven’t heard, the first crewed launch of Boeing’s Starliner capsule has been pushed back yet again to no earlier than…

TechCrunch Space: Star(side)liner

When I attended Automate in Chicago a few weeks back, multiple people thanked me for TechCrunch’s semi-regular robotics job report. It’s always edifying to get that feedback in person. While…

These 81 robotics companies are hiring

The top vehicle safety regulator in the U.S. has launched a formal probe into an April crash involving the all-electric VinFast VF8 SUV that claimed the lives of a family…

VinFast crash that killed family of four now under federal investigation

When putting a video portal in a public park in the middle of New York City, some inappropriate behavior will likely occur. The Portal, the vision of Lithuanian artist and…

NYC-Dublin real-time video portal reopens with some fixes to prevent inappropriate behavior

Longtime New York-based seed investor, Contour Venture Partners, is making progress on its latest flagship fund after lowering its target. The firm closed on $42 million, raised from 64 backers,…

Contour Venture Partners, an early investor in Datadog and Movable Ink, lowers the target for its fifth fund

Meta’s Oversight Board has now extended its scope to include the company’s newest platform, Instagram Threads, and has begun hearing cases from Threads.

Meta’s Oversight Board takes its first Threads case

The company says it’s refocusing and prioritizing fewer initiatives that will have the biggest impact on customers and add value to the business.

SeekOut, a recruiting startup last valued at $1.2 billion, lays off 30% of its workforce

The U.K.’s self-proclaimed “world-leading” regulations for self-driving cars are now official, after the Automated Vehicles (AV) Act received royal assent — the final rubber stamp any legislation must go through…

UK’s autonomous vehicle legislation becomes law, paving the way for first driverless cars by 2026

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

SoLo Funds CEO Travis Holoway: “Regulators seem driven by press releases when they should be motivated by true consumer protection and empowering equitable solutions.”

Fintech lender SoLo Funds is being sued again by the government over its lending practices

Hard tech startups generate a lot of buzz, but there’s a growing cohort of companies building digital tools squarely focused on making hard tech development faster, more efficient and —…

Rollup wants to be the hardware engineer’s workhorse

TechCrunch Disrupt 2024 is not just about groundbreaking innovations, insightful panels, and visionary speakers — it’s also about listening to YOU, the audience, and what you feel is top of…

Disrupt Audience Choice vote closes Friday

Google says the new SDK would help Google expand on its core mission of connecting the right audience to the right content at the right time.

Google is launching a new Android feature to drive users back into their installed apps

Jolla has taken the official wraps off the first version of its personal server-based AI assistant in the making. The reborn startup is building a privacy-focused AI device — aka…

Jolla debuts privacy-focused AI hardware

The ChatGPT mobile app’s net revenue first jumped 22% on the day of the GPT-4o launch and continued to grow in the following days.

ChatGPT’s mobile app revenue saw its biggest spike yet following GPT-4o launch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk