Stability AI, Hugging Face and Canva back new AI research nonprofit

Developing cutting-edge AI systems like ChatGPT requires massive technical resources, in part because they’re costly to develop and run. While several open source efforts have attempted to reverse-engineer proprietary, closed source systems created by commercial labs such as Alphabet’s DeepMind and OpenAI, they’ve often run into roadblocks — mainly due to a lack of capital and domain expertise.

Hoping to avoid this fate, one community research group, EleutherAI, is forming a nonprofit foundation. The organization today announced it’ll found a not-for-profit research institute, the EleutherAI Institute, funded by donations and grants from backers, including AI startups Hugging Face and Stability AI, former GitHub CEO Nat Friedman, Lambda Labs and Canva.

“Formalizing as an organization allows us to build a full time staff and engage in longer and more involved projects than would be feasible as a volunteer group,” Stella Biderman, an AI researcher at Booz Allen Hamilton who will co-run the EleutherAI Institute, told TechCrunch in an email interview. “In terms of a nonprofit specifically, I think it’s a no-brainer given our focus on research and the open source space.”

EleutherAI started several years ago as a grassroots collection of developers working to open source AI research. Its founding members — Connor Leahy, Leo Gao and Sid Black — wrote the code and collected the data needed to create a machine learning model close to OpenAI’s text-generating GPT-3, which at the time was getting a lot of press.

The company curated and open sourced The Pile, a collection of datasets designed to be used to train GPT-3-like models to complete text, write code and more. And it released several models under the Apache 2.0 license, including GPT-J and GPT-NeoX, language models that for a while fueled an entirely new wave of startups.

To train its models, EleutherAI relied mostly on the TPU Research Cloud, a Google Cloud program that supports projects with the expectation that the results will be shared publicly. CoreWeave, a U.S.-based cryptocurrency miner that provides cloud services for AI workloads, also supplied compute resources to EleutherAI in exchange for models its customers can use and serve.

EleutherAI grew quickly. Today, over 20 of the community’s regular contributors are working full-time, focusing mainly on research. And over the past 18 months, EleutherAI members have co-authored 28 academic papers, trained dozens of models and released ten codebases.

But the fickle nature of its cloud providers sometimes forced EleutherAI to scuttle its plans. Originally, the group had intended to release a model roughly the size of GPT-3 in terms of the number of parameters, but ended up shelving that roadmap for technical and funding reasons. (In AI, parameters are the parts of the model learned from historical training data and essentially define the skill of the model on a problem, such as generating text.)

In late 2022, EleutherAI became well-acquainted with Stability AI, the now-well-financed startup behind the image-generating AI system Stable Diffusion. Along with other collaborators, it helped to create the initial version of Stable Diffusion. And since then, Stability AI has donated a portion of compute from its AWS cluster for EleutherAI’s ongoing language model research.

After another big patron — Hugging Face — approached EleutherAI and nonprofit discussions kicked off, Biderman says. (Many EleutherAI staff were involved with the company’s BigScience effort, which sought to train and open source a model akin to GPT-3 over the course of a year.)

“EleutherAI has largely focused on large language models that are architecturally similar to ChatGPT in the past, and will likely continue to do so,” Biderman said. “Beyond training large language models, we are excited to devote more resources to ethics, interpretability and alignment work.”

One might wonder whether the involvement of commercially motivated ventures like Stability AI and Hugging Face — both of which are backed by substantial venture capital — might influence EleutherAI’s research. It’s a natural assumption — and it’s even evidence-backed. At least one study shows a direct correlation between donations and the likelihood that nonprofits speak up about a proposed government rule.

Biderman asserts that the EleutherAI Foundation will remain independent and says she doesn’t see a problem with the donor pool so far.

“We don’t develop models at the behest of commercial entities,” Biderman said. “If anything, I think that having a diverse sponsorship improves our independence. If we were fully funded by one tech company, that seems like a much bigger potential issue from our end.”

Another challenge the EleutherAI Foundation will have to overcome is ensuring its coffers don’t run dry. OpenAI is a cautionary tale; after being founded as a nonprofit in 2015, the company later transitioned to a “capped-profit” structure in order to fund its ongoing research.

Broadly speaking, nonprofit initiatives to fund AI research have been a mixed bag.

Among the success stories is the Allen Institute for AI (AI2), founded by the late Microsoft co-founder Paul Allen, which aims to achieve scientific breakthroughs in AI and machine learning. There’s also the Alan Turing Institute, the U.K.-based, government-funded national institute for data science and machine learning. Smaller promising efforts include AI startup Cohere’s Cohere For AI (despite its corporate ties) and Timnit Gebru’s Distributed AI Research, a global distributed research organization.

But for every AI2, there’s former Google chairman Eric Schmidt’s fund for AI research. Over $125 million in size, it attracted fresh controversy after Politico reported that Schmidt wields an unusually heavy sway over the White House Office of Science and Technology Policy.

Time will tell which direction the EleutherAI Foundation ultimately takes. Likely, the mission will evolve and change over time — in positive ways, we can only hope.