Products developed to manage artificial intelligence data are still largely fragmented, solving one problem at a time for developers, but not the entire life cycle.
Enter Sama, a company providing high-quality training data that powers AI technology applications. CEO Wendy Gonzalez said the company is developing the first end-to-end AI tool for training data through machine learning.
To do this, the company secured an oversubscribed $70 million in Series B financing led by Caisse de dépôt et placement du Québec (CDPQ), with participation from First Ascent Ventures, Salesforce Ventures, Vistara Growth and all existing investors.
The new capital infusion comes two years after the company raised $14.8 million in a Series A round. At that time, Sama’s thesis was around developing high-accuracy training data and had built tools where annotations could occur, Gonzalez said.
The team then looked into how to inject machine learning into that process while still maintaining high accuracy. They believed it came down to humans in the loop and developed one-click to human-in-the-loop validation with their Sama Machine Learning Assisted Annotation MicroModels that also launched Thursday.
“What we continued to learn is that data was required at every stage of the AI lifecycle, but everything was fragmented,” she added. “You were having to transform the data eight or nine times with different partners.”
Going after the Series B was purposeful. Sama aimed to develop an end-to-end platform that would be a frictionless way to get data, have it annotated with high accuracy and be able to then put that data into a model. All of that required funds, Gonzalez said.
The Sama team got to know CDPQ through its Montreal network and felt a connection to the private equity firm’s mission and ESG mandate, which resonated with its own mission.
For example, Gonzalez noted that Sama is the only certified B Corp in the AI infrastructure space and had a mission to move people out of poverty. It has already helped 56,000 people, hiring people from East Africa as expert labelers, 50% of those women.
“They cared about the problem we were solving with AI and cared about how we were doing it,” she added. “Infrastructure data is what is going to power everything in AI, so there is a tremendous opportunity for growth. Beyond that, we want to form a social mission to be the largest, if not one of the largest, B corporations powering the most innovative technology. It would also be amazing to change the way corporations think about social good, too, because diverse businesses are better businesses. It’s a lofty goal.”
Wils Theagene, senior director of CDPQ in Quebec, said the company is the second-largest pension fund in Canada, with CA$390 billion in assets under management.
The firm’s $250 million Equity 253 fund was inspired by the death of George Floyd in 2020, and encourages companies to use diversity as a growth vector, he said. Investment companies have five years to reach 25% diversity on their boards, management team and equity ownership.
Just as Sama was attracted to CDPQ’s ESG mission, Theagene said the firm liked Sama’s focus on progress, performance and social mission.
He noted that “the management team is one of the best we came across and were impressed with Wendy.” When the Sama team was explaining its performance and the industry, CDPQ felt their model for economic development in countries in need of support was the right one: instead of providing financial aid, giving people jobs to help them take ownership of their future, Theagene added.
“We believe Sama is the market leader in AI and will be the company leading the market in the future,” he said. “Their new micromodels allow them to attack AI data in an efficient manner, and its end-to-end platform is meeting the requirements companies have around data for AI applications.”
Meanwhile, Sama is working with companies like Google, Walmart and Nvidia and plans to continue managing its growth. Since joining the company six years ago, Gonzalez said Sama has experienced monthly recurring revenue growth of 13 times.
The next steps are all about accelerating coverage of the AI data pipeline, launching in new markets, like Europe and eventually Asia Pacific, and building out operations.
Looking to the future of AI data, Gonzalez sees top of mind being to reduce bias so there is more representative data.
“Similar to the European Union and ethics in AI, it would not surprise me if the U.S. goes the route of GDPR with guidelines for data protection so that people are designing in AI with transparency and purpose,” she added. “We want to have a platform built by a diverse population where then we can be diverse in what data is collected and in who is labeling it.”