Active learning is the future of generative AI: Here’s how to leverage it

5:00 AM PST • February 28, 2023

Digital generated image of silhouette of male head with multicoloured gears inside on white background. — **Image Credits:** Andriy Onufriyenko (opens in a new window) / Getty Images

Eric Landau

Contributor

Before Eric Landau co-founded Encord, he spent nearly a decade at DRW, where he was lead quantitative researcher on a global equity delta one desk and put thousands of models into production. He holds an S.M. in Applied Physics from Harvard University, an M.S. in Electrical Engineering and a B.S. in Physics from Stanford University.

What is active learning?

Active learning makes training a supervised model an iterative process. The model trains on an initial subset of labeled data from a large dataset. Then, it tries to make predictions on the rest of the unlabeled data based on what it has learned. ML engineers evaluate how certain the model is in its predictions and, by using a variety of acquisition functions, can quantify the performance benefit added by annotating one of the unlabeled samples.

By expressing uncertainty in its predictions, the model is deciding for itself what additional data will be most useful for its training. In doing so, it asks annotators to provide more examples of only that specific type of data so that it can train more intensively on that subset during its next round of training. Think of it like quizzing a student to figure out where their knowledge gap is. Once you know what problems they are missing, you can provide them with textbooks, presentations and other materials so that they can target their learning to better understand that particular aspect of the subject.

With active learning, training a model moves from being a linear process to a circular one with a strong feedback loop.

Why sophisticated companies should be ready to leverage active learning

Active learning is fundamental for closing the prototype-production gap and increasing model reliability.

It’s a common mistake to think of AI systems as a static piece of software, but these systems must be constantly learning and evolving. If not, they make the same mistakes repeatedly, or, when they’re released in the wild, they encounter new scenarios, make new mistakes and don’t have an opportunity to learn from them. They need to have the ability to learn over time, making corrections based on previous mistakes as a human would. Otherwise, models will have issues of reliability and micro robustness, and AI systems will not work in perpetuity.

Most companies using deep learning to solve real-world problems will need to incorporate active learning into their stack. If they don’t, they’ll lag their competitors. Their models won’t respond to or learn from the shifting landscape of possible scenarios.

However, incorporating active learning is easier said than done. For years, a lack of tooling and infrastructure made it difficult to facilitate active learning. Out of necessity, companies that began taking steps to improve their models’ performance with respect to the data have had to take a Frankenstein approach, cobbling together external tools and building tools in-house.

As a result, they don’t have an integrated, comprehensive system for model training. Instead, they have modular block-like processes that can’t talk to each other. They need a flexible system made up of decomposable components in which the processes communicate with one another as they go along the pipeline and create an iterative feedback loop.

The best ways to leverage active learning

Some companies, however, have implemented active learning to great effect and we can learn from them. For companies that have yet to put active learning in place also can do a few things to prepare for and make the most out of this methodology.

The gold standard of active learning is stacks that are fully iterative pipelines. Every component is run with respect to optimizing the performance of the downstream model: data selection, annotation, review, training and validation are done with an integrated logic rather than as disconnected units.

Counterintuitively, the best systems also have the most human interaction. They fully embrace the human-in-the-loop nature of iterative model improvement by opening up entry points for human supervision within each subprocess while also maintaining optionality for completely automated flows when things are working.

The most sophisticated companies therefore have stacks that are iterative, granular, inspectable, automatable and coherent.

Companies seeking to build neural networks that take advantage of active learning should build their stacks with the future in mind. These ML teams should project the types of problems they’ll have and understand the issues they’re likely to encounter when attempting to run their models in the wild. What edge cases will they encounter? In what unreasonable way is the model likely to behave?

If ML teams don’t think through these scenarios, models will inevitably make mistakes in a way that a human never would. Those errors can be quite embarrassing for companies and they should have been highly penalized because they’re so misaligned with human behavior and intuition.

Fortunately, for companies just entering the game, there’s now plenty of know-how and knowledge to be gained from companies that have broken through the production barrier. With more and more companies putting models into production, ML teams can more easily think about forward problems by studying their predecessors, as they will likely face similar problems when moving from proof of concept to production.

Another way to troubleshoot problems before they occur is to think about what a working model looks like beyond its performance metric scores. By thinking about how that model should operate in the wild and the sorts of data and scenarios it will encounter, ML teams will better understand the kinds of issues that might arise once it’s in the production stage.

Lastly, companies should make themselves aware of and understand the tools available to support an active learning and training data pipeline. Five or six years ago, companies had to build infrastructure internally and combine these in-house tools with imperfect external ones. Nowadays, every company should think before they build something internally. New tooling is being developed rapidly, and it’s likely that there’s already a tool that will save time and money while requiring no internal resourcing to maintain it.

Active learning is still in its very early days. However, every month, more companies are expressing an interest in taking advantage of this methodology. The most sophisticated ones will put the infrastructure, tooling and planning in place to harness its power.

More TechCrunch

FBI seizes hacking forum BreachForums — again

Lorenzo Franceschi-Bicchierai

15 mins ago

The FBI along with a coalition of international law enforcement agencies seized the notorious cybercrime forum BreachForums on Wednesday. For years, BreachForums has been a popular English-language forum for hackers…

FBI seizes hacking forum BreachForums — again

Media & Entertainment

Netflix to take on Google and Amazon by building its own ad server

Lauren Forristal

39 mins ago

The announcement signifies a significant shake-up in the streaming giant’s advertising approach.

Netflix to take on Google and Amazon by building its own ad server

Enterprise

Matt Garman taking over as CEO with AWS at crossroads

Ron Miller

56 mins ago

It’s tough to say that a $100 billion business finds itself at a critical juncture, but that’s the case with Amazon Web Services, the cloud arm of Amazon, and the…

Matt Garman taking over as CEO with AWS at crossroads

Google still hasn’t fixed Gemini’s biased image generator

Kyle Wiggers

1 hour ago

Back in February, Google paused its AI-powered chatbot Gemini’s ability to generate images of people after users complained of historical inaccuracies. Told to depict “a Roman legion,” for example, Gemini would show…

Google still hasn’t fixed Gemini’s biased image generator

Privacy

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

Natasha Lomas

3 hours ago

A feature Google demoed at its I/O confab yesterday, using its generative AI technology to scan voice calls in real time for conversational patterns associated with financial scams, has sent…

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

The top AI announcements from Google I/O

Kyle Wiggers

3 hours ago

Google’s going all in on AI — and it wants you to know it. During the company’s keynote at its I/O developer conference on Tuesday, Google mentioned “AI” more than…

The top AI announcements from Google I/O

Transportation

Uber has a new way to solve the concert traffic problem

Rebecca Bellan

3 hours ago

Uber is taking a shuttle product it developed for commuters in India and Egypt and converting it for an American audience. The ride-hail and delivery giant announced Wednesday at its…

Uber has a new way to solve the concert traffic problem

Hardware

Google I/O 2024: Here’s everything Google just announced

Christine Hall

3 hours ago

Here are quick hits of the biggest news from the keynote as they are announced.

Google I/O 2024: Here’s everything Google just announced

Google takes aim at Android malware with an AI-powered live threat detection service

Sarah Perez

3 hours ago

Google is preparing to launch a new system to help address the problem of malware on Android. Its new live threat detection service leverages Google Play Protect’s on-device AI to…

Apps

Google Maps is getting geospatial AR content later this year

Aisha Malik

3 hours ago

Users will be able to access the AR content by first searching for a location in Google Maps.

Google Maps is getting geospatial AR content later this year

Climate

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

Tim De Chant

3 hours ago

The heat pump startup unveiled its first products and revealed details about performance, pricing and availability.

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

Apps

Google’s new Private Space feature is like Incognito Mode for Android

Brian Heater

3 hours ago

The space is available from the launcher and can be locked as a second layer of authentication.

Google’s new Private Space feature is like Incognito Mode for Android

Media & Entertainment

Google TV to launch AI-generated movie descriptions

Lauren Forristal

3 hours ago

Gemini, the company’s family of generative AI models, will enhance the smart TV operating system so it can generate descriptions for movies and TV shows.

Google TV to launch AI-generated movie descriptions

Hardware

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

Brian Heater

3 hours ago

When triggered, the AI-powered feature will automatically lock the device down.

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

Security

Google adds live threat detection and screen-sharing protection to Android

Ivan Mehta

3 hours ago

The company said it is increasing the on-device capability of its Google Play Protect system to detect fraudulent apps trying to breach sensitive permissions.

Google adds live threat detection and screen-sharing protection to Android

Apps

Wear OS 5 hits developer preview, offering better battery life

Sarah Perez

3 hours ago

This latest release, one of many announcements from the Google I/O 2024 developer conference, focuses on improved battery life and other performance improvements, like more efficient workout tracking.

Wear OS 5 hits developer preview, offering better battery life

Startups

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Marina Temkin

3 hours ago

For years, Sammy Faycurry has been hearing from his registered dietitian (RD) mom and sister about how poorly many Americans eat and their struggles with delivering nutritional counseling. Although nearly…

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Hardware

Apple announces new accessibility features for iPhone and iPad users

Lauren Forristal

4 hours ago

Apple is bringing new accessibility features to iPads and iPhones, designed to cater to a diverse range of user needs.

Apple announces new accessibility features for iPhone and iPad users

Startups

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

TechCrunch Events

5 hours ago

TechCrunch Disrupt, our flagship startup event held annually in San Francisco, is back on October 28-30 — and you can expect a bustling crowd of thousands of startup enthusiasts. Exciting…

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Anthropic hires Instagram co-founder as head of product

Kyle Wiggers

6 hours ago

Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the…

Anthropic hires Instagram co-founder as head of product

Venture

Venture orgs form alliance to standardize data collection

Dominic-Madori Davis

6 hours ago

Seven orgs so far have signed on to standardize the way data is collected and shared.

Venture orgs form alliance to standardize data collection

Enterprise

Alkira connects with $100M for a solution that connects your clouds

Ingrid Lunden

6 hours ago

As cloud adoption continues to surge toward the $1 trillion mark in annual spend, we’re seeing a wave of enterprise startups gaining traction with customers and investors for tools to…

Alkira connects with $100M for a solution that connects your clouds

Climate

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

Tim De Chant

7 hours ago

Charging has long been the Achilles’ heel of electric vehicles. One startup thinks it has a better way for apartment dwelling EV drivers to charge overnight.

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

Fundraising

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

Christine Hall

7 hours ago

So did investors laugh them out of the room when they explained how they wanted to replace Quickbooks? Kind of.

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

Weka raises $140M as the AI boom bolsters data platforms

Kyle Wiggers

7 hours ago

While an increasing number of companies are investing in AI, many are struggling to get AI-powered projects into production — much less delivering meaningful ROI. The challenges are many. But…

Weka raises $140M as the AI boom bolsters data platforms

Startups

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Mary Ann Azevedo

9 hours ago

PayHOA, a previously bootstrapped Kentucky-based startup that offers software for self-managed homeowner associations (HOAs), is an example of how real-world problems can translate into opportunity. It just raised a $27.5…

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Commerce

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack

Ingrid Lunden

9 hours ago

Restaurant365, which offers a restaurant management suite, has raised a hot $175M from ICONIQ Growth, KKR and L Catterton.

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack

Venture

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

Mike Butcher

10 hours ago

Venture firm Shilling has launched a €50M fund to support growth-stage startups in its own portfolio and to invest in startups everywhere else.

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI

Kyle Wiggers

10 hours ago

Chang She, previously the VP of engineering at Tubi and a Cloudera veteran, has years of experience building data tooling and infrastructure. But when She began working in the AI…

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI

Climate

Berlin-based trawa raises €10M to use AI to make buying renewable energy easier for SMEs

Mike Butcher

11 hours ago

Trawa simplifies energy purchasing and management for SMEs by leveraging an AI-powered platform and downstream data from customers.

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Tags

FBI seizes hacking forum BreachForums — again

Netflix to take on Google and Amazon by building its own ad server

Matt Garman taking over as CEO with AWS at crossroads

Google still hasn’t fixed Gemini’s biased image generator

Google’s call-scanning AI could dial up censorship by default, privacy experts warn

The top AI announcements from Google I/O

Uber has a new way to solve the concert traffic problem

Google I/O 2024: Here’s everything Google just announced

Google takes aim at Android malware with an AI-powered live threat detection service

Google Maps is getting geospatial AR content later this year

Quilt heat pump sports sleek design from veterans of Apple, Tesla and Nest

Google’s new Private Space feature is like Incognito Mode for Android

Google TV to launch AI-generated movie descriptions

Android’s new Theft Detection Lock helps deter smartphone snatch and grabs

Google adds live threat detection and screen-sharing protection to Android

Wear OS 5 hits developer preview, offering better battery life

Dietitian startup Fay has been booming from Ozempic patients and emerges from stealth with $25M from General Catalyst, Forerunner

Apple announces new accessibility features for iPhone and iPad users

Startup Blueprint: TC Disrupt 2024 Builders Stage agenda sneak peek!

Anthropic hires Instagram co-founder as head of product

Venture orgs form alliance to standardize data collection

Alkira connects with $100M for a solution that connects your clouds

Orange Charger thinks a $750 outlet will solve EV charging for apartment dwellers

Embedded accounting startup Layer secures $2.3M toward goal of replacing QuickBooks

Weka raises $140M as the AI boom bolsters data platforms

Meet PayHOA, a profitable and once-bootstrapped SaaS startup that just landed a $27.5M Series A

Restaurant365 orders in $175M at $1B+ valuation to supersize its food service software stack

Portuguese VC firm Shilling launches €50M opportunity fund to back growth-stage startups

LanceDB, which counts Midjourney as a customer, is building databases for multimodal AI

Berlin-based trawa raises €10M to use AI to make buying renewable energy easier for SMEs

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags