The Challenges Of Building AI Apps

5:00 PM PDT • October 15, 2015

**Image Credits:** ChristianChan (opens in a new window) / Shutterstock (opens in a new window)

Mike Chalfen

Contributor

Mike Chalfen is a co-founder and partner at Mosaic Ventures.

Artificial intelligence (AI) has an intellectual lineage stretching back to the greats of computer theory: Turing and, ultimately, Babbage, inventor of the calculating machine. What we now see in London, where leading teams such as DeepMind are working on machine learning, is the movement from the realm of computer science to practical uses and business cases.

It’s not just Google, which bought the DeepMind team last year, or Facebook, with a 50-person AI lab, that see the possibilities. Roughly one-sixth of YC companies seemed to be using machine learning in the most recent cohort, while IBM has bet billions on the success of Watson, its Jeopardy question-answering supercomputer.

Thousands of companies are taking advantage of infrastructure to manage or extract insight from large datasets. They are all working to predict outcomes or recommend or execute actions based on analyses of programmatically available digitized data.

I’d like to share some of the challenges facing startups that are attempting to build AI-based applications for businesses, and how some companies are attempting to overcome those challenges. Selecting, perfecting and combining the algorithms themselves is only a small part of the thoughtful work done by the best entrepreneurs. Other important factors include:

Proprietary access to unique data that will form the basis for the training data set.
A clear view of how insight will be generated, with tightly coupled software that intelligently extracts meaning from the data or evaluates which data requires human classification.
If possible, a data model that can accommodate new data sources as they emerge.
A skilled team that can write or adapt publicly available algorithms, select the right algorithm for the desired result and combine algorithms as needed to optimize the result.

A couple of years ago, data analysis of any kind was labelled as “data science.” Today, the label AI is widely applied, sometimes carelessly. So let’s first consider what is described as AI.

Current commercial applications are “narrow” or “weak” forms of AI. This means the machine specializes in one area, and cannot infer from first principles as humans can (“general AI”). Narrow AI is based on well-understood techniques being exploited commercially for the first time. What is considered AI can quickly become a well-understood data science technique.

A closely related approach is “deep learning,” whereby data inputs are not pre-described. Rather, models learn about data (and data structures), then, using multiple layers of non-linear feedback, learn important features of the data and even self-refine.

While the technique has been around for more than 20 years, wider access to compute power is finally a match for the data intensity it requires. London-based startup Improbable is an exciting example of a company using vast compute power and deep learning to simulate complex environments, ranging from open-ended game worlds to cities.

Still, many startups we meet claim to incorporate machine learning (ML) into their technologies. For most of these companies, when we scratch beneath the surface, ML is not a truly important element of the product. In some cases it is just a veneer to make a project seem cutting-edge. In others it is real, but just table stakes, so does not offer a technical barrier to competitors; rather, it enables startups to offer an increasingly accurate and efficient service for customers.

For instance, some startups will use commodity code, of which there are many significant open-source libraries. An interesting open-source project for distributed stream and batch data processing, Apache Flink, has collated a library of publicly available ML algorithms that will scale to large data sets.

Amazon launched machine learning as a service in April, and startups such as MetaMind are aiming to offer AI as a service to developers, an extension of the more crowded market of predictive analytics as a service. The reality is that most algorithms are well-known, and AI learning techniques will commoditize fast.

So companies building products using narrow AI need to be very thoughtful about how they build and improve their products or services.

The Moat: Training Data

Training data is at the heart of building distinctive narrow-AI-based products. Startups need to find sources of structured data that can help them build the best possible models. In this case, “best” means the data set is large enough to learn from, and varied enough that it may help a range of customers rather than only one, and the resulting machine can seamlessly improve processes or decision-making.

Machine learning theory states that with unlimited data, we could expect all algorithms to produce similar-quality results. So startups will only resist commoditization if they have access to a unique data set and extend their early lead by continuously learning how to improve their algorithms based on end-user interactions. The most famous example is Google’s use of clickstream data as a private source of training data to improve search-ranking results.

As we have touched on before, startups sometimes confuse revenue traction with value creation. Choosing projects that yield short-term revenue based on easily available data sets is unlikely to yield a differentiated, valuable application.

One example is Digital Genius, a London- and New York-based startup focused on automating customer service conversations. The founder bootstrapped for its first couple of years. Admirable though that is, the initial technology and commercial choices were not scalable. The first version of its technology was very flexible, but it needed to be highly customized. Also, initial demand was for lower-value applications in marketing services. The combination was not attractive to venture investors at that point.

However, the company may well have found its way. First, the team created a platform it can reuse for many different text-based AI applications, whereas it had started with a tool set. Second, it has found a high-value focus in automating text conversations. Importantly, the algorithm is based (amongst other data quarries) on analyses of huge repositories of real call-center transcripts. This may now yield a replicable product that can be the foundation of a large business.

A Technology-Driven Process To Extract Insight And Meaning From The Data Set

Having access to useful data sets is only a start: A system needs to extract accurate metadata from the data, and use it as an input to improve the machine’s accuracy.

We find that the best AI-driven startups are focused on increasing both the throughput and the refinement and accuracy of their algorithms. That takes iteration and time — and a lot of data — to get right.

For example, Unbabel, a Lisbon- and San Francisco-based startup focused on AI-augmented translation. In order to deliver, the company must create scalable methods for translators to annotate, refine and reject machine translations. The workflow software that Unbabel’s translators use to assess translation accuracy is strikingly granular. Rather than a simple yes/no/maybe, 15-20 measures of accuracy are judged by the translator, which also suggests an alternative. Accuracy in this case can also include brand suitability for Unbabel’s commercial customers. The machine then uses this feedback to self-improve.

That is an intelligent and well-managed approach to model improvement. It solves for quality and scale, rather than just efficiency, and acknowledges that the machine is a work in progress and not yet ready for full automation of translation tasks.

That iterative combination of training data and machine accuracy is the heart of what many startups are working through.

How Do You Make It All Work?

A lot of commentary on AI-based applications makes building them sound straightforward. Yet AI itself is rarely sufficient. As with many disruptive software opportunities, startups using AI need to be competent in multiple spheres and make the product or service easy to use.

Even once the right algorithms are chosen, a good data set identified and a process to improve and scale ML is hardened, startups are often just at the starting line. Some challenges (and often ones that are worthy of venture capital funding) require innovation on multiple fronts. Even for narrowly focused startups, engineering challenges are rarely one-dimensional.

IT operations-focused startup Moogsoft is a good example (full disclosure: I am an angel investor). Phil Tee, the founder and CEO of Moog, is a fifth-time founder, and as the founding CTO of Micromuse was responsible for the dominant incumbent in network management. His goal was to work out how to process millions of different event data points so that IT operations could be evaluated across the full stack.

He saw that he would need to build a machine that was model-free so that it could make sense of new data sources on the fly, as operations evolved. This required the technical chops to build relevant algorithms that together could cope with untagged data. Phil then went further and broke additional ground by predicting faults — all whilst tuning the machine for processing at scale and in real time.

The team also needed to have the understanding of enterprise use cases so that the software was effective in reducing time to resolve and troubleshoot tickets, and in delivering transparency to the affected organizations. This combination is not trivial.

Of the many potential applications of AI that get us excited — for example, automated code generation, QA or optimization platform, automated risk and lending decisions in the financial supply chain, automated legal documentation and contract analytics, or automated visual assessments such as health checks or insurance claims adjustments — many fall into this category of startup management and engineering challenges that are not going to be straightforward to solve.

What’s The Right Team?

Assembling the right team is a challenge. Supply of graduates from the world’s best computational linguistics, machine learning and data science programs cannot meet demand. Google and Facebook are building teams and acquiring startups with critical mass, offering recruits the chance to work on both general and narrow AI problems with enormous resources at their disposal. Their pay scales make it difficult for smaller startups to recruit. Startup CEOs who are aiming to recruit the best in their specific fields have to recruit globally.

Most importantly, startups must offer recruits an exciting problem to solve in order to attract a world-class team. At least, as we’ve shown, the valuable problems also tend to be the harder ones. Mere efficiency gains are not attractive enough as a mission to attract the best. And once the ML team is assembled, as the Moog example shows, wider skills are needed to turn a working machine into a commercially viable product.

AI, predictive analytics and data science-driven startups are only going to grow in size and in importance. Navigating how to build them is not straightforward.

If you are working on a very ambitious project in this field and have identified unique or proprietary training data, and have a product and a business model that can capitalize on the insights from the data and a well-rounded team to go to market, please get in touch, we would love to learn more.

More TechCrunch

AI chip startup DEEPX secures $80M Series C at a $529M valuation

Kate Park

10 hours ago

The Series C funding, which brings its total raise to around $95 million, will go toward mass production of the startup’s inaugural products

AI chip startup DEEPX secures $80M Series C at a $529M valuation

Startups

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

Mary Ann Azevedo

12 hours ago

A dust-up between Evolve Bank & Trust, Mercury and Synapse has led TabaPay to abandon its acquisition plans of troubled banking-as-a-service startup Synapse.

Infighting among fintech players has caused TabaPay to ‘pull out’ from buying bankrupt Synapse

Media & Entertainment

Apple’s ‘Crush’ ad is disgusting

Devin Coldewey

12 hours ago

The problem is not the media, but the message.

Apps

Google built some of the first social apps for Android, including Twitter and others

Sarah Perez

13 hours ago

The Twitter for Android client was “a demo app that Google had created and gave to us,” says Particle co-founder and ex-Twitter employee Sara Beykpour.

Apps

WhatsApp’s latest update streamlines navigation and adds a ‘darker dark mode’

Aisha Malik

15 hours ago

WhatsApp is updating its mobile apps for a fresh and more streamlined look, while also introducing a new “darker dark mode,” the company announced on Thursday. The messaging app says…

Apps

Plinky is an app for you to collect and organize links easily

Ivan Mehta

15 hours ago

Plinky lets you solve the problem of saving and organizing links from anywhere with a focus on simplicity and customization.

Plinky is an app for you to collect and organize links easily

Google I/O 2024: How to watch

Brian Heater

16 hours ago

The keynote kicks off at 10 a.m. PT on Tuesday and will offer glimpses into the latest versions of Android, Wear OS and Android TV.

Venture

Triomics raises $15M Series A to automate cancer clinical trials matching

Marina Temkin

16 hours ago

For cancer patients, medicines administered in clinical trials can help save or extend lives. But despite thousands of trials in the United States each year, only 3% to 5% of…

Triomics raises $15M Series A to automate cancer clinical trials matching

Transportation

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Kirsten Korosec

16 hours ago

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Tap, tap.…

Tesla drives Luminar lidar sales and Motional pauses robotaxi plans

Reddit locks down its public data in new content policy, says use now requires a contract

Sarah Perez

16 hours ago

The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how Reddit’s data is being accessed and used by commercial entities and…

Venture

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

Rebecca Szkutak

16 hours ago

Eva Ho plans to step away from her position as general partner at Fika Ventures, the Los Angeles-based seed firm she co-founded in 2016. Fika told LPs of Ho’s intention…

Fika Ventures co-founder Eva Ho will step back from the firm after its current fund is deployed

Amazon’s CTO built a meeting-summarizing app for some reason

Kyle Wiggers

17 hours ago

In a post on Werner Vogels’ personal blog, he details Distill, an open-source app he built to transcribe and summarize conference calls.

Amazon’s CTO built a meeting-summarizing app for some reason

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

Ingrid Lunden

17 hours ago

Paris-based Mistral AI, a startup working on open source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation,…

Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is

Enterprise

Google I/O 2024: What to expect

Brian Heater

17 hours ago

You can expect plenty of AI, but probably not a lot of hardware.

Apps

Bumble says it’s looking to M&A to drive growth

Sarah Perez

17 hours ago

Dating apps and other social friend-finders are being put on notice: Dating app giant Bumble is looking to make more acquisitions.

Startups

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Ron Miller

18 hours ago

When Class founder Michael Chasen was in college, he and a buddy came up with the idea for Blackboard, an online classroom organizational tool. His original company was acquired for…

Blackboard founder transforms Zoom add-on designed for teachers into business tool

Startups

Groww joins the first wave of Indian startups moving domiciles back home from US

Manish Singh

18 hours ago

Groww, an Indian investment app, has become one of the first startups from the country to shift its domicile back home.

Groww joins the first wave of Indian startups moving domiciles back home from US

Security

Dell discloses data breach of customers’ physical addresses

Lorenzo Franceschi-Bicchierai

18 hours ago

Technology giant Dell notified customers on Thursday that it experienced a data breach involving customers’ names and physical addresses. In an email seen by TechCrunch and shared by several people…

Dell discloses data breach of customers’ physical addresses

Featured Article

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

The Israeli startup has raised $5.5M for its platform that uses “statistical AI” to generate synthetic data that it says is as good as the real thing.

Paul Sawers

18 hours ago

Fairgen ‘boosts’ survey results using synthetic data and AI-generated responses

Hardware

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Lauren Forristal

19 hours ago

Hydrow, the at-home rowing machine maker, announced Thursday that it has acquired a majority stake in Speede Fitness, the company behind the AI-enabled strength training machine. The rowing startup also…

Rowing startup Hydrow acquires a majority stake in Speede Fitness as their CEO steps down

Retell AI lets companies build ‘voice agents’ to answer phone calls

Kyle Wiggers

19 hours ago

Call centers are embracing automation. There’s debate as to whether that’s a good thing, but it’s happening — and quite possibly accelerating. According to research firm TechSci Research, the global…

Retell AI lets companies build ‘voice agents’ to answer phone calls

Apps

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

Aisha Malik

21 hours ago

TikTok is starting to automatically label AI-generated content that was made on other platforms, the company announced on Thursday. With this change, if a creator posts content on TikTok that…

TikTok will automatically label AI-generated content created on platforms like DALL·E 3

Fintech

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Manish Singh

23 hours ago

India’s mobile payments regulator is likely to extend the deadline for imposing market share caps on the popular UPI (unified payments interface) payments rail by one to two years, sources…

India likely to delay UPI market caps in win for PhonePe-Google Pay duopoly

Commerce

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

Kate Park

1 day ago

Line Man Wongnai, an on-demand food delivery service in Thailand, is considering an initial public offering on a Thai exchange or the U.S. in 2025.

Thai food delivery app Line Man Wongnai weighs IPO in Thailand, US in 2025

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Devin Coldewey

2 days ago

Ever wonder why conversational AI like ChatGPT says “Sorry, I can’t do that” or some other polite refusal? OpenAI is offering a limited look at the reasoning behind its own…

OpenAI offers a peek behind the curtain of its AI’s secret instructions

Security

US Patent and Trademark Office confirms another leak of filers’ address data

Zack Whittaker

2 days ago

The federal government agency responsible for granting patents and trademarks is alerting thousands of filers whose private addresses were exposed following a second data spill in as many years. The…

US Patent and Trademark Office confirms another leak of filers’ address data

Security

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Lorenzo Franceschi-Bicchierai

2 days ago

As part of an investigation into people involved in the pro-independence movement in Catalonia, the Spanish police obtained information from the encrypted services Wire and Proton, which helped the authorities…

Encrypted services Apple, Proton and Wire helped Spanish police identify activist

Apps

Match looks to Hinge as Tinder fails

Lauren Forristal

2 days ago

Match Group, the company that owns several dating apps, including Tinder and Hinge, released its first-quarter earnings report on Tuesday, which shows that Tinder’s paying user base has decreased for…

Apps

Gratitude Plus makes social networking positive, private and personal

Sarah Perez

2 days ago

Private social networking is making a comeback. Gratitude Plus, a startup that aims to shift social media in a more positive direction, is expanding its wellness-focused, personal reflections journal to…

Gratitude Plus makes social networking positive, private and personal

Startups

Can AI help founders fundraise more quickly and easily?

Alex Wilhelm

2 days ago

With venture totals slipping year-over-year in key markets like the United States, and concern that venture firms themselves are struggling to raise more capital, founders might be worried. After all,…

The Challenges Of Building AI Apps

Mike Chalfen

The Moat: Training Data

A Technology-Driven Process To Extract Insight And Meaning From The Data Set

How Do You Make It All Work?

What’s The Right Team?

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags