Anyscale, from the creators of the Ray-distributed computing project, launches with $20.6M led by a16z

Open source has become a critical building block of modern software, and today a new startup is coming out of stealth to capitalise on one of the newer frontiers in open source: using it to build and manage distributed application environments, an approach being used increasingly to handle large computing projects, such as those involving artificial intelligence or scientific or other complex calculations.

Anyscale, a startup founded by the same team that built the Project Ray open-source distributed programming framework out of UC Berkeley — Robert Nishihara, Philipp Moritz and Ion Stoica, and Berkeley professor Michael I. Jordan — has raised $20.6 million in a Series A round of funding led by Andreessen Horowitz, with participation also from NEA, Intel Capital, Ant Financial, Amplify Partners, 11.2 Capital and The House Fund.

The company plans to use the money to build out its first commercial products — details of which are still being kept under wraps but will more generally include the ability to easily scale out a computing project from one laptop to a cluster of machines; and a group of libraries and applications to manage projects. These are expected to launch next year.

“Right now we are focused on making Ray a standard for building applications,” said Stoica in an interview. “The company will build tools and a runtime platform for Ray. So, if you want to run a Ray application securely and with high performance then you will use our product.”

The funding is partly strategic: Intel is one of the big companies that has been using Ray for its own computing projects, alongside Amazon, Microsoft and Ant Financial.

“Intel IT has been leveraging Ray to scale Python workloads with minimal code modifications,” said Moty Fania, principal engineer and chief technology officer for Intel IT’s Enterprise and Platform Group, in a statement. “With the implementation into Intel’s manufacturing and testing processes, we have found that Ray helps increase the speed and scale of our hyperparameter selection techniques and auto modeling processes used for creating personalized chip tests. For us, this has resulted in reduced costs, additional capacity and improved quality.”

With an impressive user list like this for the free-to-use Ray, you might ask yourself, what is the purpose of Anyscale? As Stoica and Nishihara explained, the idea will be to create simpler and easier ways to implement Ray, to make it usable whether you’re one of the Amazons of the world, or a more modest, and possibly less tech-centric operation.

“We see that this will be valuable mostly for companies who do not have engineering experts,” Stoica said.

The problem that Anyscale is solving is a central one to the future of large-scale, involved computing projects: there are an increasing array of problems that are being tackled with computing solutions, but as the complexity of the work involved increases, there is a limit to how much work a single machine (even a big one) can handle. (Indeed, Anyscale cites IDC figures estimating that the amount of data created and copied annually will reach 175 zettabytes by 2025.)

While one day there may be quantum-computing machines that can run efficiently and at scale to address these kinds of tasks, today this isn’t a realistic option, and so distributed computing has emerged as a solution.

Ray was devised as a standard to use to implement distributed computing environments, but on its own it’s too technical for the uninitiated to use.

“Imagine you’re a biologist,” added Nishihara. “You can write a simple program and run it at a large scale, but to do that successfully you need not only to be a biology expert but a computing expert. That’s just way too high a barrier.”

The people behind Anyscale (and Ray) have a long and very credible list of other work behind them that speaks to the opportunities that are being spotted here. Stoica, for example, was also the co-founder of Databricks, Conviva and one of the original developers of Apache Spark.

“I worked on Databricks with Ion and that’s how it started,” Andreessen Horowitz co-founder Ben Horowitz said in an interview. He added that the firm has been a regular investor into projects coming out of UC Berkeley. Ray, and more specifically Anyscale, is notable for its relevance to today’s computing needs.

“With Ray it was a very attractive project because of the open-source metrics but also because of the issue it addresses,” he said.

“We’ve been grappling with Moore’s Law being over, but more interestingly, it’s inadequate for things like artificial intelligence applications,” where increasing computing power is needed that outstrips what any single machine can do. “You have to be able to deal with distributed computing, but the problem for everyone but Google is that distributed computing is hard, so we have been looking for a solution.”