Inference.ai matches AI workloads with cloud GPU compute

GPUs’ ability to perform many computations in parallel make them well-suited to running today’s most capable AI. But GPUs are becoming tougher to procure, as companies of all sizes increase their investments in AI-powered products.

Nvidia’s best-performing AI cards sold out last year, and the CEO of chipmaker TSMC suggested that general supply could be constrained into 2025. The problem’s so acute, in fact, that it has the U.S. Federal Trade Commission’s attention — the agency recently announced it’s investigating several partnerships between AI startups and cloud giants like Google and AWS over whether the startups might have anti-competitive, privileged access to GPU compute.

What’s the solution? It depends on your resources, really. Tech giants like Meta, Google, Amazon and Microsoft are buying up what GPUs they can and developing their own custom chips. Ventures with fewer resources are at the mercy of the market — but it doesn’t have to be that way forever, say John Yue and Michael Yu.

Yue and Yu are the co-founders of Inference.ai, a platform that provides infrastructure-as-a-service cloud GPU compute through partnerships with third-party data centers. Inference uses algorithms to match companies’ workloads with GPU resources, Yue says — aiming to take the guesswork out of choosing and acquiring infrastructure.

“Inference brings clarity to the confusing hardware landscape for founders and developers with new chips coming from Nvidia, Intel, AMD, Groq [and so on] — allowing higher throughput, lower latency and lower cost,” Yue said. “Our tools and team allow for decision-makers to filter out a lot of the noise and quickly find the right fit for their project.”

Inference essentially provides customers a GPU instance in the cloud, along with 5TB of object storage. The company claims that — thanks to its algorithmic matching tech and deals with data center operators — it can offer dramatically cheaper GPU compute with better availability than major public cloud providers.

“The hosted GPU market is confusing and changes daily,” Yue said. “Plus, we’ve seen pricing vary up to 1,000% for the same configuration. Our tools and team allow for decision makers to filter out a lot of the noise and quickly find the right fit for their project.”

Now, TechCrunch wasn’t able to put those claims to the test. But regardless of whether they’re true, Inference has competition — and lots of it.

See: CoreWeave, a crypto mining operation-turned-GPU provider, which is reportedly expected to rake in around $1.5 billion in revenue by 2024. Its close competitor, Lambda Labs, secured $300 million in venture capital last October. There’s also Together — a GPU cloud — not to mention startups like Run.ai and Exafunction, which aim to reduce AI dev costs by abstracting away the underlying hardware.

Inference’s investors seem to think there’s room for another player, though. The startup recently closed a $4 million round from Cherubic Ventures, Maple VC and Fusion Fund, which Yue says is being put toward building out Inference’s deployment infrastructure.

In an emailed statement, Cherubic’s Matt Cheng added:

“The requirements for processing capacity will keep on increasing as AI is the foundation of so many of today’s products and systems. We’re confident that the Inference team, with their past knowledge in hardware and cloud infrastructure, has what it takes to succeed. We decided to invest because accelerated computing and storage services are driving the AI revolution, and Inference product will fuel the next wave of AI growth.”