MLCommons wants to create AI benchmarks for laptops, desktops and workstations

As AI increasingly moves from the cloud to on-device, how, exactly, is one supposed to know whether such and such new laptop will run a generative-AI-powered app faster than rival off-the-shelf laptops — or desktops or all-in-ones, for that matter? Knowing could mean the difference between waiting a few seconds for an image to generate versus a few minutes — and as they say, time is money.

MLCommons, the industry group behind a number of AI-related hardware benchmarking standards, wants to make it easier to comparison shop with the launch of performance benchmarks targeted at “client systems” — that is, consumer PCs.

Today, MLCommons announced the formation of a new working group, MLPerf Client, whose goal is establishing AI benchmarks for desktops, laptops and workstations running Windows, Linux and other operating systems. MLCommons promises that the benchmarks will be “scenario-driven,” focusing on real end user use cases and “grounded in feedback from the community.”

To that end, MLPerf Client’s first benchmark will focus on text-generating models, specifically Meta’s Llama 2, which MLCommons executive director David Kanter notes has already been incorporated into MLCommons’ other benchmarking suites for datacenter hardware. Meta’s also done extensive work on Llama 2 with Qualcomm and Microsoft to optimize Llama 2 for Windows — much to the benefit of Windows-running devices.

“The time is ripe to bring MLPerf to client systems, as AI is becoming an expected part of computing everywhere,” Kanter said in a press release. “We look forward to teaming up with our members to bring the excellence of MLPerf into client systems and drive new capabilities for the broader community.”

Members of the MLPerf Client working group include AMD, Arm, Asus, Dell, Intel, Lenovo, Microsoft, Nvidia and Qualcomm — but notably not Apple.

Apple isn’t a member of the MLCommons, either, and a Microsoft engineering director (Yannis Minadakis) co-chairs the MLPerf Client group — which makes the company’s absence not entirely surprising. The disappointing outcome, however, is that whatever AI benchmarks MLPerf Client conjures up won’t be tested across Apple devices — at least not in the near-ish term.

Still, this writer’s curious to see what sort of benchmarks and tooling emerge from MLPerf Client, macOS-supporting or no. Assuming GenAI is here to stay — and there’s no indication that the bubble is about to burst anytime soon — I wouldn’t be surprised to see these types of metrics play an increasing role in device-buying decisions.

In my best-case scenario, the MLPerf Client benchmarks are akin to the many PC build comparison tools online, giving an indication as to what AI performance one can expect from a particular machine. Perhaps they’ll expand to cover phones and tablets in the future, even, given Qualcomm’s and Arm’s participation (both are heavily invested in the mobile device ecosystem). It’s clearly early days — but here’s hoping.