With Kite's demise, can generative AI for code succeed?

Kite, a startup developing an AI-powered coding assistant, abruptly shut down last month. Despite securing tens of millions of dollars in VC backing, Kite struggled to pay the bills, founder Adam Smith revealed in a postmortem blog post, running into engineering headwinds that made finding a product-market fit essentially impossible.

“We failed to deliver our vision of AI-assisted programming because we were 10+ years too early to market, i.e., the tech is not ready yet,” Smith said. “Our product did not monetize, and it took too long to figure that out.”

Kite’s failure doesn’t bode well for the many other companies pursuing — and attempting to commercialize — generative AI for coding. Copilot is perhaps the highest-profile example, a code-generating tool developed by GitHub and OpenAI priced at $10 per month. But Smith notes that while Copilot shows a lot of promise, it still has “a long way to go” — estimating that it could cost over $100 million to build a “production-quality” tool capable of synthesizing code reliably.

To get a sense of the challenges that lie ahead for players in the generative code space, TechCrunch spoke with startups developing AI systems for coding, including Tabnine and DeepCode, which Snyk acquired in 2020. Tabnine’s service predicts and suggests next lines of code based on context and syntax, like Copilot. DeepCode works a bit differently, using AI to notify developers of bugs as they code.

Tabnine CEO Dror Weiss was transparent about what he sees as the barriers standing in the way of code-synthesizing systems’ mass adoption: the AI itself, user experience and monetization.

Systems like Copilot are developed by taking huge amounts of data from the web — mainly open source codebases — and “training” AI models on the resulting dataset until they “learn” the statistical relationships in the code. For instance, when Copilot users type some code or comments, the system suggests the next line of code, including complete methods, boilerplate code, unit tests and algorithms.

From a computational standpoint, these systems are expensive to develop. Copilot contains 12 billion parameters, or the parts of the model learned from the data on which it trained. (Think of parameters as variables that determine how accurately the model performs a task, e.g., generating code.) A 2020 study from AI21 Labs pegged the cost of developing a text-generating model with 1.5 billion parameters at $1.6 million not factoring in the cost of hiring engineering talent.

Inference — actually running a trained model — is another drain on financial resources. One source estimates the cost of running OpenAI’s text-generating GPT-3 model, which has around 175 billion parameters, on a single AWS instance (p3dn.24xlarge) would be $87,000 per year at a minimum.

Setting aside code-generating AI model training and inferencing cost-based roadblocks, there’s the matter of acquiring the right data for training. Snyk AI head Veselin Raychev, who oversees data science research at DeepCode, points out that while sample code is available in abundance, carefully labeled code that helps an AI system pick up on the right patterns of programming often isn’t.

Consider a snippet of code containing a type of bug. Without a label to spotlight said bug, an AI system might unwittingly learn to replicate it in the code that it generates.

There’s an easy solution, right? Manually adding labels. But doing so isn’t cheap at scale. Companies regularly spend heavily to label data for AI model training. The cost of labeling code would likely be even higher, given the expertise required.

“How to gain the most value out of the available data is where the tech battle in the next few years will be,” Raychev told TechCrunch in an email interview.

Another battle brewing on the data front concerns fair use, or the doctrine in U.S. law that permits the use of copyrighted material without first having to obtain permission from the rights holder. Companies like OpenAI and GitHub claim that fair use protects them in the event their systems are trained knowingly or unknowingly on copyrighted code.

But not everyone agrees. The Free Software Foundation, a nonprofit to advocate for the free software movement, has called Copilot “unacceptable and unjust.” Microsoft, GitHub and OpenAI are being sued in a class action lawsuit that accuses them of violating copyright law by allowing Copilot to regurgitate sections of licensed code without providing credit.

Amazon, which offers its own generative coding system called CodeWhisperer, attempts to skirt the fair use question by having the tool cite sources for code it suggests. As for Tabnine and DeepCode, both claim that they only train their AI on codebases with permissive licenses that explicitly allow reuse.

“We plan to allow open source projects to opt out of AI training and to give attribution in Tabnine to all repositories that were used for training,” Weiss said, adding that Tabnine is working on functionality to help developers identify potential sources for longer code suggestions à la CodeWhisperer. “Tabnine took the legal and moral decision not to train the model on code that our users write.”

Past the technical development hurdles of code-generating AI lie the UI challenges to which Weiss alluded earlier. Where it concerns developer workflows, it can be tough to figure out the right place to “plug in” an AI system like Copilot, Weiss said. Kite had trouble there — according to Smith, the startup’s generative AI didn’t resonate with enough of the 500,000 developers who used the free version to make the business sustainable.

“We always say that Tabnine is 50% AI and 50% user experience, but in reality, the user experience part might be larger,” Weiss told TechCrunch. “Some key user experience challenges that will arise in the next acts of AI in software development are: How will an AI code reviewer interact with developers? What is the correct experience for sharing best practices and recommended patterns through the AI? How does AI help developers read and understand code? Risking exaggerating on pathos, I’d say that this is defining what software development in the 21st century is.”

On the monetization question, can code-generating AI be monetized successfully today, given all the roadblocks standing in the way? Perhaps. In August, Microsoft revealed that 400,000 users subscribed to Copilot, translating to $4 million in monthly revenue — a figure that may fall short of Copilot’s R&D and operating costs. Kite struggled to generate meaningful revenue after seven years of development.

Still, Weiss is optimistic that enterprises will see the value in code-synthesizing AI and returns will “dramatically rise.” Certain investors appear to believe that’s true; Tabnine recently raised $15.5 million from Qualcomm Ventures, Samsung Next and others.

“Tabnine, Kite and other players went through a path I strongly believe in, which is starting from the developers themselves,” Weiss said. “Monetizing developers is considered notoriously hard because of well-known reasons, but with the right product offering, it is doable, as proven by the commercial success of both Tabnine and GitHub Copilot.”