Getting predictive about politics (and everything else)

Comment

Image Credits: DonkeyHotey (opens in a new window) / Flickr (opens in a new window) under a CC BY-ND 2.0 (opens in a new window) license.

David Elkington

Contributor

David Elkington is the founder and chief executive of InsideSales.com.

As the polls were closing for the 1948 presidential election between Thomas Dewey and Harry Truman, the Chicago Tribune went to print with the headline “Dewey Defeats Truman” based on early voting predictions… It led to the forever-immortalized photo of an eventually elected President Truman triumphantly holding that same edition of the paper and boasting, “That ain’t the way I heard it!”

Nearly seven decades later we’re still seeing incorrect predictions in political races. Take the 2016 Iowa caucus for example, where Ted Cruz triumphed despite expert predictions to the contrary. It’s enough to make you wonder: What data is used to create these predictions? And with today’s technology, why isn’t it more accurate?

Big data has gotten, well, big over the past few years, with the advent of storage systems like Hadoop, Hortonworks and Cloudera to manage the 2.5 quintillion bytes of data being created every day, a global Internet population that has reached 3.2 billion people, and a dramatic increase in both computing power and storage capacity. In fact, the real magic of predictive analytics is big data, as noted in this insightful article on the VersionOne blog.

So why aren’t we living in a world where computers solve crimes, drive cars, cure sicknesses, and accurately predict political races? 

The reason is because it’s not enough to just store, access and process data. Even sophisticated algorithms will only get you so far.

The key to successful machine learning and artificial intelligence algorithms comes down to two factors: huge quantities of timely and accurate data.Without these, the whole premise of predictive analytics falls apart.

As the frequent inaccuracies behind political polling and other predictions suggest, finding both timely and accurate data is easier said than done. There are four typical methods you can use to acquire data, and the each of them ties directly to the value of your data.

Photo courtesy of Flickr/David Erickson.
Photo courtesy of Flickr/David Erickson.

 

1. Required self-declared data

The first and most basic way companies acquire data is through government organizations that require individuals and businesses to provide those entities with their information. 

Examples include registering a business with a state, companies or individuals paying taxes, census registration, individuals registering for a driver license, Social Security and even individuals declaring political party preference. 

Because of the Freedom of Information Act (FOIA), some of this data is publicly available. But collecting this type of data has both advantages and disadvantages. 

The quantity and breadth of data through these sources is robust, but the information is not particularly timely or accurate because it is collected too infrequently and human nature is constantly changing. The full census happens only every 10 years, driver licenses last four to six years, and people move every six years and change jobs every four years. 

Typically, data becomes invalid at roughly the rate of 30 percent to 40 percent annually. After two and a half to three years, any list you have is completely out of date.

Additionally, the accuracy rate of required self-declared data is typically low because people are not incentivized to provide accurate data. People are often suspicious of governments and therefore only supply just enough information to avoid violating the law.

2. “Brute force” self-declared or observed data

Photo courtesy of Flickr/WyoFile WyoFile.
Photo courtesy of Flickr/WyoFile WyoFile.

 

When organizations like Dun & Bradstreet and Hoovers call people and interview them, they are capturing self-declared data almost by brute force. It’s not volunteered data because individuals aren’t sharing it publicly, but it’s not required data because people can decline to participate without violating the law.

When other organizations use web crawlers to collect people’s information, they are acquiring observed data by brute force. This data can be either self-declared or observed and therefore has a higher accuracy rate than government data. It’s not subject to as many faulty interpretations, biases and dishonest responses.

Political polls frequently rely on this kind of data collection. Organizations like Pew Research Center survey likely voters to interview them on their level of engagement in the election, perceptions on candidates in play and voter preferences. Because this data is self-declared and not required, interviewees are more likely to provide honest answers.

While this data collection method does improve the accuracy, it is not updated often enough because it involves a labor-intensive process: Individually surveying hundreds of U.S. citizens on their voting preferences to predict the polls takes considerable time and may overlook the chance of shifting perspectives following a recent debate, public statement, etc. Therefore, with brute force data, timeliness suffers. 

3. Volunteered self-declared data

Organizations can collect huge volumes of self-declared data on social media sites like Facebook, Twitter and LinkedIn.

The Pew Research Center found that 74% of online adults use social networking sites. Statista reported that more than 1.5 billion people are actively using Facebook while 400 million are active on Instagram and 316 million are tweeting up a storm.

Social media data can be especially useful when taking a pulse of consumer perception around current events, such as the ongoing presidential elections. With people updating their online profiles and posting status updates 24/7, this is some of the timeliest data the world has ever seen.

It’s still not highly accurate, though, because it’s self-declared. We show our Facebook friends only the life we want them to believe we have. We’re curating our personal brand. Have you ever seen anybody admit they were fired on their LinkedIn profile?

4. Crowdsourced observed data

Technology that provides value in our personal or business life can observe our behavior in the process.

As we make purchases, travel and make phone calls, technological systems observe our behavior. These systems can amass massive amounts of data without asking us, purely through the power of observation.

Amazon is just one example of a company that is crowdsourcing data across multiple vendors, individuals, geographies, industries, etc., with 294 million active customer accounts worldwide. The company is processing so many transactions that it saw net sales increase 23% to $25.4 billion in the third quarter of 2015. 

With timeliness and accuracy both amazingly high, crowdsourced observed data is the most sophisticated and accurate source of data available.

The optimal blend

Photo courtesy of Flickr/DonkeyHotey
Photo courtesy of Flickr/DonkeyHotey.

 

Generally, the harder it is to acquire data, the more valuable it is. The best way to ensure you, or political pollsters, have timely and accurate data is to use a combination of all four sources. This optimal blend creates “high-definition” data: a collection of crowdsourced observed data blended with the other three sources to provide true predictive value and accurate results.

It’s the only way to produce maximum resolution and maximum clarity — and predict more accurate political polls.

But political polls are just one example of how predictive intelligence is playing out in our daily lives. It’s important to note that any intelligence is only as good as its source, and with new ways of gathering data popping up every day, the ugly truth is that there’s not one crystal ball that gets it right 100 percent of the time. Take social media for example: As we saw in President Obama’s successful social networking campaign in 2008, social media’s role in politics is continuing to grow. The abundance of data hosted through social channels opens a significant opportunity for real-time readings of nationwide sentiment on candidates and campaigns.

As new forms of data and analysis continue to emerge, data-based predictions will only get more and more accurate, but take them with the grain of salt they deserve. What is guaranteed is that this election season will be fascinating to watch play out–and we’ll only know what will happen for certain in November. 

More TechCrunch

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo! recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Seven firms so far have signed on to standardize the way data is collected and shared.

Venture firms form alliance to standardize data collection

As cloud adoption continues to surge towards the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing Quickbooks

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Restaurant365, which offers a restaurant management suite, has raised a hot $175M from ICONIQ Growth, KKR and L Catterton.

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack 

Venture firm Shilling has launched a €50M fund to support growth-stage startups in its own portfolio and to invest in startups everywhere else. 

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

Chang She, previously the VP of engineering at Tubi and a Cloudera veteran, has years of experience building data tooling and infrastructure. But when She began working in the AI…

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI

Trawa simplifies energy purchasing and management for SMEs by leveraging an AI-powered platform and downstream data from customers. 

Berlin-based trawa raises €10M to use AI to make buying renewable energy easier for SMEs

Lydia is splitting itself into two apps — Lydia for P2P payments and Sumeria for those looking for a mobile-first bank account.

Lydia, the French payments app with 8 million users, launches mobile banking app Sumeria

Cargo ships docking at a commercial port incur costs called “disbursements” and “port call expenses.” This might be port dues, towage, and pilotage fees. It’s a complex patchwork and all…

Shipping logistics startup Harbor Lab raises $16M Series A led by Atomico

AWS has confirmed its European “sovereign cloud” will go live by the end of 2025, enabling greater data residency for the region.

AWS confirms will launch European ‘sovereign cloud’ in Germany by 2025, plans €7.8B investment over 15 years

Go Digit, an Indian insurance startup, has raised $141 million from investors including Goldman Sachs, ADIA, and Morgan Stanley as part of its IPO.

Indian insurance startup Go Digit raises $141M from anchor investors ahead of IPO

Peakbridge intends to invest in between 16 and 20 companies, investing around $10 million in each company. It has made eight investments so far.

Food VC Peakbridge has new $187M fund to transform future of food, like lab-made cocoa

For over six decades, the nonprofit has been active in the financial services sector.

Accion’s new $152.5M fund will back financial institutions serving small businesses globally

Meta’s newest social network, Threads, is starting its own fact-checking program after piggybacking on Instagram and Facebook’s network for a few months.

Threads finally starts its own fact-checking program

Looking Glass makes trippy-looking mixed-reality screens that make things look 3D without the need of special glasses. Today, it launches a pair of new displays, including a 16-inch mode that…

Looking Glass launches new 3D displays

Replacing Sutskever is Jakub Pachocki, OpenAI’s director of research.

Ilya Sutskever, OpenAI co-founder and longtime chief scientist, departs

Intuitive Machines made history when it became the first private company to land a spacecraft on the moon, so it makes sense to adapt that tech for Mars.

Intuitive Machines wants to help NASA return samples from Mars

As Google revamps itself for the AI era, offering AI overviews within its search results, the company is introducing a new way to filter for just text-based links. With the…

Google adds ‘Web’ search filter for showing old-school text links as AI rolls out

Blue Origin’s New Shepard rocket will take a crew to suborbital space for the first time in nearly two years later this month, the company announced on Tuesday.  The NS-25…

Blue Origin to resume crewed New Shepard launches on May 19

This will enable developers to use the on-device model to power their own AI features.

Google is building its Gemini Nano AI model into Chrome on the desktop

It ran 110 minutes, but Google managed to reference AI a whopping 121 times during Google I/O 2024 (by its own count). CEO Sundar Pichai referenced the figure to wrap…

Google mentioned ‘AI’ 120+ times during its I/O keynote

Firebase Genkit is an open source framework that enables developers to quickly build AI into new and existing applications.

Google launches Firebase Genkit, a new open source framework for building AI-powered apps

In the coming months, Google says it will open up the Gemini Nano model to more developers.

Patreon and Grammarly are already experimenting with Gemini Nano, says Google

As part of the update, Reddit also launched a dedicated AMA tab within the web post composer.

Reddit introduces new tools for ‘Ask Me Anything,’ its Q&A feature

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

LearnLM is already powering features across Google products, including in YouTube, Google’s Gemini apps, Google Search and Google Classroom.

LearnLM is Google’s new family of AI models for education