Intel, Arm and Nvidia propose new standard to make AI processing more efficient

In pursuit of faster and more efficient AI system development, Intel, Arm and Nvidia today published a draft specification for what they refer to as a common interchange format for AI. While voluntary, the proposed “8-bit floating point (FP8)” standard, they say, has the potential to accelerate AI development by optimizing hardware memory usage and work for both AI training (i.e., engineering AI systems) and inference (running the systems).

When developing an AI system, data scientists are faced with key engineering choices beyond simply collecting data to train the system. One is selecting a format to represent the weights of the system — weights being the factors learned from the training data that influence the system’s predictions. Weights are what enable a system like GPT-3 to generate whole paragraphs from a sentence-long prompt, for example, or DALL-E 2 to create photorealistic portraits from a caption.

Common formats include half-precision floating point, or FP16, which uses 16 bits to represent the weights of the system, and single precision (FP32), which uses 32 bits. Half-precision and lower reduce the amount of memory required to train and run an AI system while speeding up computations and even reducing bandwidth and power usage. But they sacrifice some accuracy to achieve those gains; after all, 16 bits is less to work with than 32.

Many in the industry — including Intel, Arm and Nvidia — are coalescing around FP8 (8 bits) as the sweet spot, however. In a blog post, Nvidia director of product marketing Shar Narasimhan notes that the aforementioned proposed format, which is FP8, shows “comparable accuracy” to 16-bit precisions across use cases including computer vision and image-generating systems while delivering “significant” speedups.

Nvidia, Arm and Intel say they’re making their FP8 format license-free, in an open format. A white paper describes it in more detail; Narasimhan says that the specs will be submitted to the IEEE, the professional organization that maintains standards across a number of technical domains, for consideration at a later date.

“We believe that having a common interchange format will enable rapid advancements and the interoperability of both hardware and software platforms to advance computing,” Narasimhan.

The trio isn’t pushing for parity out of the goodness of their hearts, necessarily. Nvidia’s GH100 Hopper architecture natively implements FP8, as does Intel’s Gaudi2 AI training chipset.

But a common FP8 format would also benefit rivals like SambaNova, AMD, Groq, IBM, Graphcore and Cerebras — all of which have experimented with or adopted some form of FP8 for system development. In a blog post this July, Graphcore co-founder and CTO Simon Knowles wrote that the “advent of 8-bit floating point offers tremendous performance and efficiency benefits for AI compute,” asserting that it’s also “an opportunity” for the industry to settle on a “single, open standard” rather than ushering in a mix of competing formats.