Sweep aims to automate basic dev tasks using large language models

Developers spend a lot of time on mundane, repetitive tasks — and surprisingly little on actual coding.

In Stack Overflow’s 2022 developer survey, 63% of respondents said that they devote more than 30 minutes a day searching for answers or solutions to problems — which adds up to between 333 to 651 hours of time lost per week across a team of 50 developers. A separate poll from Propeller Insights and Rollbar found that over a third of developers spend around a quarter of their time fixing bugs, with slightly more than a quarter (26%) setting aside up to half their time fixing bugs.

The trend frustrated William Zeng and Kevin Lu. So earlier this year, they — both veterans of Roblox, the video-game-turned-social-network — created a platform called Sweep to autonomously handle dev tasks like high-level debugging.

“We started Sweep after working at Roblox together and constantly dealing with software chores we knew could be automated with AI,” Zeng, Sweep’s CEO, told TechCrunch in an email interview. “Sweep is like an AI-powered junior dev for software teams.”

TechCrunch previously covered Sweep during Y Combinator’s Summer 2023 Demo Day. But since then, the startup has closed a new financing round, raising $2 million from Goat Capital, Replit CEO Amjad Masad, Replit VP of AI Michele Catasta and Exceptional Capital at a $25 million post-money valuation.

Sweep allows devs to describe a request in natural language — for example, “add debug logs to my data pipeline” — outside of an IDE and generate the corresponding code. The platform can then push that code to the appropriate codebase via a pull request, and address comments made on the pull request either from code maintainers or owners — a bit like GitHub Copilot, but more autonomous.

“Sweep allows engineers to ship faster,” Zeng said. “We’ll handle tech debt accumulated with every code change, such as improving error logs and adding unit tests in addition to refactoring inefficient code.”

Sweep, which specializes in writing Python code, leverages a combination of AI models for code generation. They include OpenAI’s GPT-4, but also a custom “code search engine” — importantly not trained on Sweep customer data, Zeng says — that helps plan and execute “repository-wide” code changes.

“We built our own code search engine for Python, which leverages lexical and vector search techniques,” Zeng added. Lexical search looks for literal matches — or slight variations on — portions of code, while vector search can find more loosely related code that shares certain characteristics. “We have one of the best unit test generation abilities available and will run and execute tests in real time,” he continued.

In the future, Sweep plans to beef up its platform’s code generation capabilities with StarCoder, the open source code-generating model from Hugging Face and ServiceNow.

Given AI’s tendency to make mistakes, though, I’m a little skeptical of Sweep’s reliability over the long run. A Stanford-affiliated research team found that engineers who use AI tools are more likely to cause security vulnerabilities in their apps because the tools often generate code that appears to be superficially correct but poses security issues.

There’s also the copyright question. Some code-generating models — not necessarily StarCoder or Sweep’s own, but others — are trained on copyrighted or code under a restrictive license, and these models can regurgitate this code when prompted in a certain way. Legal experts have argued that these tools could put companies at risk if they were to unwittingly incorporate copyrighted suggestions from the tools into their production software.

Sweep’s solution is prompting users to review and edit any generated code themselves before pushing changes to the target master codebase.

“The main challenges affecting AI developer tools are around reliability and managing large codebases,” Zeng said. “We’re using our knowledge around both older and newer methods to make Sweep robust.”

Sweep charges a pretty penny for its services — $480 per seat per month. (By contrast, the business-focused tiers for GitHub Copilot and Amazon CodeWhisperer cost around $20 per user per month.) But that hasn’t dissuaded customers apparently. Zeng claims that Sweep, with a rather humble war chest totaling $2.8 million, has enough capital coming in from clientele to “last the company years.”

“The new money will be for expanding our team in the coming year from two employees to five,” he continued. “We’re going to continue focusing on Python, and improving across all areas of tech debt from unit testing, refactoring and handling leftover to-dos in the code.”