AI

Finding the Goldilocks zone for applied AI

Comment

Image Credits: MirageC (opens in a new window) / Getty Images

Ivy Nguyen

Contributor

Ivy Nguyen is an associate at Zetta Venture Partners.

More posts from Ivy Nguyen

While Elon Musk and Mark Zuckerberg debate the dangers of artificial general intelligence, startups applying AI to more narrowly defined problems such as accelerating the performance of sales teams and improving the operating efficiency of manufacturing lines are building billion-dollar businesses. Narrowly defining a problem, however, is only the first step to finding valuable business applications of AI.

To find the right opportunity around which to build an AI business, startups must apply the “Goldilocks principle” in several different dimensions to find the sweet spot that is “just right” to begin — not too far in one dimension, not too far in another. Here are some ways for aspiring startup founders to thread the needle with their AI strategy, based on what we’ve learned from working with thousands of AI startups.

 “Just right” prediction time horizons

Unlike pre-intelligence software, AI responds to the environment in which they operate; algorithms take in data and return an answer or prediction. Depending on the application, that prediction may describe an outcome in the near term, such as tomorrow’s weather, or an outcome many years in the future, such as whether a patient will develop cancer in 20 years. The time horizon of the algorithm’s prediction is critical to its usefulness and to whether it offers an opportunity to build defensibility.

Algorithms making predictions with long time horizons are difficult to evaluate and improve. For example, an algorithm may use the schedule of a contractor’s previous projects to predict that a particular construction project will fall six months behind schedule and go over budget by 20 percent. Until this new project is completed, the algorithm designer and end user can only tell whether the prediction is directionally correct — that is, whether the project is falling behind or costs are higher.

Even when the final project numbers end up very close to the predicted numbers, it will be difficult to complete the feedback loop and positively reinforce the algorithm. Many factors may influence complex systems like a construction project, making it difficult to A/B test the prediction to tease out the input variables from unknown confounding factors. The more complex the system, the longer it may take the algorithm to complete a reinforcement cycle, and the more difficult it becomes to precisely train the algorithm.

While many enterprise customers are open to piloting AI solutions, startups must be able to validate the algorithm’s performance in order to complete the sale. The most convincing way to validate an algorithm is by using the customer’s real-time data, but this approach may be difficult to achieve during a pilot. If the startup does get access to the customer’s data, the prediction time horizon should be short enough that the algorithm can be validated during the pilot period.

Historic data, if it’s available, can serve as a stopgap to train an algorithm and temporarily validate it via backtesting. Training an algorithm making long time horizon predictions on historic data is risky because processes and environments are more likely to have changed the further back you dig into historic records, making historic data sets less descriptive of present-day conditions.

In other cases, while the historic data describing outcomes exists for you to train an algorithm, it may not capture the input variable under consideration. In the construction example, that could mean that you found out that sites using blue safety hats are more likely to complete projects on time, but since that hat color wasn’t previously helpful in managing projects, that information wasn’t recorded in the archival records. This data must be captured from scratch, which further delays your time to market.

Instead of making singular “hero” predictions with long time horizons, AI startups should build multiple algorithms making smaller, simpler predictions with short time horizons. Decomposing an environment into simpler subsystems or processes limits the number of inputs, making them easier to control for confounding factors. The BIM 360 Project IQ Team at Autodesk takes this small prediction approach to areas that contribute to construction project delays. Their models predict safety and score vendor and subcontractor quality/reliability, all of which can be measured while a project is ongoing.

Shorter time horizons make it easier for the algorithm engineer to monitor its change in performance and take action to quickly improve it, instead of being limited to backtesting on historic data. The shorter the time horizon, the shorter the algorithm’s feedback loop will be. As each cycle through the feedback incrementally compounds the algorithm’s performance, shorter feedback loops are better for building defensibility. 

“Just right” actionability window

Most algorithms model dynamic systems and return a prediction for a human to act on. Depending on how quickly the system is changing, the algorithm’s output may not remain valid for very long: the prediction may “decay” before the user can take action. In order to be useful to the end user, the algorithm must be designed to accommodate the limitations of computing and human speed. 

In a typical AI-human workflow, the human feeds input data into the algorithm, the algorithm runs calculations on that input data and returns an output that predicts a certain outcome or recommends a course of action; the human interprets that information to decide on a course of action, then takes action. The time it takes the algorithm to compute an answer and the time it takes for a human to act on the output are the two largest bottlenecks in this workflow. 

For most of AI history, slow computational speeds have severely limited the scope of applied AI. An algorithm’s prediction depends on the input data, and the input data represents a snapshot in time at the moment it was recorded. If the environment described by the data changes faster than the algorithm can compute the input data, by the time the algorithm completes its computations and returns a prediction, the prediction will only describe a moment in the past and will not be actionable. For example, the algorithm behind the music app Shazam may have needed several hours to identify a song after first “hearing” it using the computational power of a Windows 95 computer. 

The rise of cloud computing and the development of hardware specially optimized for AI computations has dramatically broadened the scope of areas where applied AI is actionable and affordable. While macro tech advancements can greatly advance applied AI, the algorithm is not totally held hostage to current limits of computation; reinforcement through training also can improve the algorithm’s response time. The more of the same example an algorithm encounters, the more quickly it can skip computations to arrive at a prediction. Thanks to advances in computation and reinforcement, today Shazam takes less than 15 seconds to identify a song. 

Automating the decision and action also could help users make use of predictions that decay too quickly to wait for humans to respond. Opsani is one such company using AI to make decisions that are too numerous and fast-moving for humans to make effectively. Unlike human DevOps, who can only move so fast to optimize performance based on recommendations from an algorithm, Opsani applies AI to both identify and automatically improve operations of applications and cloud infrastructure so its customers can enjoy dramatically better performance.

Not all applications of AI can be completely automated, however, if the perceived risk is too high for end users to accept, or if regulations mandate that humans must approve the decision. 

“Just right” performance minimums

Just like software startups launch when they have built a minimum viable product (MVP) in order to collect actionable feedback from initial customers, AI startups should launch when they reach the minimum algorithmic performance (MAP) required by early adopters, so that the algorithm can be trained on more diverse and fresh data sets and avoid becoming overfit to a training set.

Most applications don’t require 100 percent accuracy to be valuable. For example, a fraud detection algorithm may only immediately catch five percent of fraud cases within 24 hours of when they occur, but human fraud investigators catch 15 percent of fraud cases after a month of analysis. In this case, the MAP is zero, because the fraud detection algorithm could serve as a first filter in order to reduce the number of cases the human investigators must process. The startup can go to market immediately in order to secure access to the large volume of fraud data used for training their algorithm. Over time, the algorithms’ accuracy will improve and reduce the burden on human investigators, freeing them to focus on the most complex cases.

Startups building algorithms for zero or low MAP applications will be able to launch quickly, but may be continuously looking over their shoulder for copycats, if these copycats appear before the algorithm has reached a high level of performance. 

Startups attacking low MAP problems also should watch out for problems that can be solved with near 100 percent accuracy with a very small training set, where the problem being modeled is relatively simple, with few dimensions to track and few possible variations in outcome.

AI-powered contract processing is a good example of an application where the algorithm’s performance plateaus quickly. There are thousands of contract types, but most of them share key fields: the parties involved, the items of value being exchanged, time frame, etc. Specific document types like mortgage applications or rental agreements are highly standardized in order to comply with regulation. Across multiple startups, we have seen algorithms that automatically process these documents needing only a few hundred examples to train to an acceptable degree of accuracy before additional examples do little to improve the algorithm, making it easy for new entrants to match incumbents and earlier entrants in performance.

AIs built for applications where human labor is inexpensive and able to easily achieve high accuracy may need to reach a higher MAP before they can find an early adopter. Tasks requiring fine motor skills, for example, have yet to be taken over by robots because human performance sets a very high MAP to overcome. When picking up an object, the AIs powering the robotic hand must gauge an object’s stiffness and weight with a high degree of accuracy, otherwise the hand will damage the object being handled. Humans can very accurately gauge these dimensions with almost no training. Startups attacking high MAP problems must invest more time and capital into acquiring enough data to reach MAP and launch. 

Threading the needle

Narrow AI can demonstrate impressive gains in a wide range of applications — in the research lab. Building a business around a narrow AI application, on the other hand, requires a new playbook. This process is heavily dependent on the specific use case on all dimensions, and the performance of the algorithm is merely one starting point. There’s no one-size-fits-all approach to moving an algorithm from the research lab to the market, but we hope these ideas will provide a useful blueprint for you to begin.

More TechCrunch

The European venture capital firm raised its fourth fund as fund as climate tech “comes of age.”

ETF Partners raises €284M for climate startups that will be effective quickly — not 20 years down the road

Copilot, Microsoft’s brand of generative AI, will soon be far more deeply integrated into the Windows 11 experience.

Microsoft wants to make Windows an AI operating system, launches Copilot+ PCs

“When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”

Scarlett Johansson says that OpenAI approached her to use her voice

Hello and welcome back to TechCrunch Space. For those who haven’t heard, the first crewed launch of Boeing’s Starliner capsule has been pushed back yet again to no earlier than…

TechCrunch Space: Star(side)liner

When I attended Automate in Chicago a few weeks back, multiple people thanked me for TechCrunch’s semi-regular robotics job report. It’s always edifying to get that feedback in person. While…

These 81 robotics companies are hiring

The top vehicle safety regulator in the U.S. has launched a formal probe into an April crash involving the all-electric VinFast VF8 SUV that claimed the lives of a family…

VinFast crash that killed family of four now under federal investigation

When putting a video portal in a public park in the middle of New York City, some inappropriate behavior will likely occur. The Portal, the vision of Lithuanian artist and…

NYC-Dublin real-time video portal reopens with some fixes to prevent inappropriate behavior

Longtime New York-based seed investor, Contour Venture Partners, is making progress on its latest flagship fund after lowering its target. The firm closed on $42 million, raised from 64 backers,…

Contour Venture Partners, an early investor in Datadog and Movable Ink, lowers the target for its fifth fund

Meta’s Oversight Board has now extended its scope to include the company’s newest platform, Instagram Threads, and has begun hearing cases from Threads.

Meta’s Oversight Board takes its first Threads case

The company says it’s refocusing and prioritizing fewer initiatives that will have the biggest impact on customers and add value to the business.

SeekOut, a recruiting startup last valued at $1.2 billion, lays off 30% of its workforce

The U.K.’s self-proclaimed “world-leading” regulations for self-driving cars are now official, after the Automated Vehicles (AV) Act received royal assent — the final rubber stamp any legislation must go through…

UK’s autonomous vehicle legislation becomes law, paving the way for first driverless cars by 2026

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

SoLo Funds CEO Travis Holoway: “Regulators seem driven by press releases when they should be motivated by true consumer protection and empowering equitable solutions.”

Fintech lender SoLo Funds is being sued again by the government over its lending practices

Hard tech startups generate a lot of buzz, but there’s a growing cohort of companies building digital tools squarely focused on making hard tech development faster, more efficient and —…

Rollup wants to be the hardware engineer’s workhorse

TechCrunch Disrupt 2024 is not just about groundbreaking innovations, insightful panels, and visionary speakers — it’s also about listening to YOU, the audience, and what you feel is top of…

Disrupt Audience Choice vote closes Friday

Google says the new SDK would help Google expand on its core mission of connecting the right audience to the right content at the right time.

Google is launching a new Android feature to drive users back into their installed apps

Jolla has taken the official wraps off the first version of its personal server-based AI assistant in the making. The reborn startup is building a privacy-focused AI device — aka…

Jolla debuts privacy-focused AI hardware

The ChatGPT mobile app’s net revenue first jumped 22% on the day of the GPT-4o launch and continued to grow in the following days.

ChatGPT’s mobile app revenue saw its biggest spike yet following GPT-4o launch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

1 day ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back