Wikipedia’s Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others

Comment

Image Credits:

Wikidata, the first new project to emerge from the Wikimedia Foundation since 2006, is now beginning development. The organization, known best for its user-edited encyclopedia of knowledge Wikipedia, recently announced the new project at February’s Semantic Tech & Business Conference in Berlin, describing Wikidata as new effort to provide a database of knowledge that can be read and edited by humans and machines alike.

There have been other attempts at creating a semantic database built from Wikipedia’s data before – for example, DBpedia, a community effort to extract structured content from Wikipedia and make it available online. The difference is that, with Wikidata, the data won’t just be made available, it will also be made editable by anyone.

The project’s goal in developing a semantic, machine-readable database doesn’t just help push the web forward, it also helps Wikipedia itself. The data will bring all the localized versions of Wikipedia on par with each other in terms of the basic facts they house. Today, the English, German, French and Dutch versions offer the most coverage, with other languages falling much further behind.

Wikidata will also enable users to ask different types of questions, like which of the world’s ten largest cities have a female mayor?, for example. Queries like this are today answered by user-created Wikipedia Lists – that is, manually created structured answers. Wikidata, on the hand, will be able to create these lists automatically.

The initial effort to create Wikidata is being led by the German chapter of Wikimedia, Wikimedia Deutschland, whose CEO Pavel Richter calls the project “ground-breaking,” and describes it as “the largest technical project ever undertaken by one of the 40 international Wikimedia chapters.” Much of the early experimentation which resulted in the Wikidata concept was done in Germany, which is why it’s serving as the base of operations for the new undertaking.

The German Chapter will perform the initial development involved in the creation of Wikidata, but will later hand over the operation and maintenance to the Wikimedia Foundation when complete. The estimation is that hand-off will occur a year from now, in March 2013.

The overall project will have three phases, the first of which involves creating one Wikidata page for each Wikipedia entry across Wikipedia’s over 280 supported languages. This will provide the online encyclopedia with one common source of structured data that can be used in all articles, no matter which language they’re in. For example, the date of someone’s birth would be recorded and maintained in one place: Wikidata. Phase one will also involve centralizing the links between the different language versions of Wikipedia. This part of the work will be finished by August 2012.

In phase two, editors will be able to add and use data in Wikidata, and this will be available by December 2012. Finally, phase three will allow for the automatic creation of lists and charts based on the data in Wikidata, which can then populate the pages of Wikipedia.

In terms of how Wikidata will impact Wikipedia’s user interface, the plan is for the data to live in the “info boxes” that run down the right-hand side of a Wikipedia page. (For example: those on the right side of NYC’s page). The data will be inputted at data.wikipedia.org, which will then drive the info boxes wherever they appear, across languages, and in other pages that use the same info boxes. However, because the project is just now going into development, some of these details may change.

Below, an early concept for Wikidata:

All the data contained in Wikidata will be published under a free Creative Commons license, which opens it up for use by any number of external applications, including e-government, the sciences and more.

Dr. Denny Vrandečić, who joined Wikimedia from the Karlsruhe Institute of Technology, is leading a team of eight developers to build Wikidata, and is joined by Dr. Markus Krötzsch of the University of Oxford. Krötzsch and Vrandečić, notably, were both co-founders of the Semantic MediaWiki project, which pursued similar goals to that of Wikidata over the past few years.

The initial development of Wikidata is being funded through a donation of 1.3 million Euros, granted in half by the Allen Institute for Artificial Intelligence, an organization established by Microsoft co-founder Paul Allen in 2010. The goal of the Institute is to support long-range research activities that have the potential to accelerate progress in artificial intelligence, which includes web semantics.

“Wikidata will build on semantic technology that we have long supported, will accelerate the pace of scientific discovery, and will create an extraordinary new data resource for the world,” says Dr. Mark Greaves, VP of the Allen Institute.

Another quarter of the funding comes from the Gordon and Betty Moore Foundation, through its Science program, and another quarter comes from Google. According to Google’s Director of Open Source, Chris DiBona, Google hopes that Wikidata will make large amounts of structured data available to “all.” (All, meaning, course, to Google itself, too.)

This ties back to all those vague reports of “major changes” coming to Google’s search engine in the coming months, seemingly published far ahead of any actual news (like this), possibly in a bit of a PR push to take the focus off the growing criticism surrounding Google+…or possibly to simply tease the news by educating the public about what the “semantic web” is.

Google, which stated it would be increasing its efforts at providing direct answers to common queries – like those with a specific, factual piece of data – could obviously build greatly on top of something like Wikidata. As it moves further into semantic search, it could provide details about the people, places and things its users search for. It would actually know what things are, whether birth dates, locations, distances, sizes, temperatures, etc., and also how they’re connected to other points of data. Google previously said it expects semantic search changes to impact 10% to 20% of queries. (Google declined to provide any on the record comment regarding its future plans in this area).

Ironically, the results of Wikidata’s efforts may then actually mean fewer Google referrals to Wikipedia pages. Short answers could be provided by Google itself, positioned at the top of the search results. The need to click through to read full Wikipedia articles (or any articles, for that matter) would be reduced, leading Google users to spend more time on Google.

More TechCrunch

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

4 hours ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine

IndieBio’s Bay Area incubator is about to debut its 15th cohort of biotech startups. We took special note of a few, which were making some major, bordering on ludicrous, claims…

IndieBio’s SF incubator lineup is making some wild biotech promises

YouTube TV has announced that its multiview feature for watching four streams at once is now available on Android phones and tablets. The Android launch comes two months after YouTube…

YouTube TV’s ‘multiview’ feature is now available on Android phones and tablets

Featured Article

Two Santa Cruz students uncover security bug that could let millions do their laundry for free

CSC ServiceWorks provides laundry machines to thousands of residential homes and universities, but the company ignored requests to fix a security bug.

2 days ago
Two Santa Cruz students uncover security bug that could let millions do their laundry for free

TechCrunch Disrupt 2024 is just around the corner, and the buzz is palpable. But what if we told you there’s a chance for you to not just attend, but also…

Harness the TechCrunch Effect: Host a Side Event at Disrupt 2024

Decks are all about telling a compelling story and Goodcarbon does a good job on that front. But there’s important information missing too.

Pitch Deck Teardown: Goodcarbon’s $5.5M seed deck

Slack is making it difficult for its customers if they want the company to stop using its data for model training.

Slack under attack over sneaky AI training policy

A Texas-based company that provides health insurance and benefit plans disclosed a data breach affecting almost 2.5 million people, some of whom had their Social Security number stolen. WebTPA said…

Healthcare company WebTPA discloses breach affecting 2.5 million people

Featured Article

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Microsoft won’t be facing antitrust scrutiny in the U.K. over its recent investment into French AI startup Mistral AI.

2 days ago
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

Ember has partnered with HSBC in the U.K. so that the bank’s business customers can access Ember’s services from their online accounts.

Embedded finance is still trendy as accounting automation startup Ember partners with HSBC UK

Kudos uses AI to figure out consumer spending habits so it can then provide more personalized financial advice, like maximizing rewards and utilizing credit effectively.

Kudos lands $10M for an AI smart wallet that picks the best credit card for purchases

The EU’s warning comes after Microsoft failed to respond to a legally binding request for information that focused on its generative AI tools.

EU warns Microsoft it could be fined billions over missing GenAI risk info

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday.  The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

X pushes more users to Communities

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?