By Anup Ojah, Senior Manager of Global HPC, Oracle Cloud Engineering, and Amy Sorrells, Global Communications Director, Oracle for Startups
Even as both legacy and greenfield applications have rushed to the public cloud over recent years, some of the world’s most-demanding workloads have stubbornly stayed in private data centers.
High Performance Computing (HPC) has been a cloud holdout for good reason—it demands data-crunching capabilities that can’t be achieved by scaling commodity hardware.
For that reason, 99 percent of the world’s HPC jobs still run on-premises, often leaving the scientists, engineers, and product developers that perform critical simulations and batch calculations stuck waiting for limited access to computing power. And the expense of those difficult-to-manage systems has locked many startups out of innovating in emerging technology sectors.
That’s all changing—recent breakthroughs in cloud architecture are finally democratizing the field.
Oracle has been leading that charge with its next-generation Oracle Cloud Infrastructure (OCI). OCI was built from the ground up to take advantage of state-of-the-art technologies that make it possible to run hundreds, even thousands, of CPU or GPU cores in unison as a powerful supercomputer.
Thanks to OCI’s advances in networking, memory, storage, and software, HPC practitioners can finally benefit from the flexibility, cost-efficiency, and ease-of-use of cloud, all with the peace of mind of top-tier security.
Cutting-edge startups with compute-intensive workloads are recognizing the opportunity: last year, the number of AI, machine learning, and big data companies that enrolled in the Oracle for Startups program grew by more than 136 percent. And we’re tracking for even higher numbers this year, aided by our partnership with NVIDIA Inception.
Those are companies like GridMarkets, which is serving HPC-powered solutions to industries as diverse as pharma and media; the same platform that runs massive simulations for drug discovery can render animations for film studios.
Other startups are using cloud-based HPC systems to tackle difficult data-analysis problems with AI and advanced analytics. Among them, Kinetica is tapping Oracle Cloud HPC capacity to analyze huge datasets, including drone data collected on the San Francisco Bay to combat pollution.
A supercomputing cloud
HPC systems allow engineers to model the behavior of new products, be they airplanes or microchips; meteorologists to predict the weather; and data scientists to train artificial intelligence.
In years past, those jobs requiring massively parallel processing have been run on what is typically called a “supercomputer”—a somewhat vague designation for any system that performs at an elite level.
Despite the enormous capital expense of buying and maintaining those on-premises systems (as well as the fact that after a few years, they aren’t quite as “super” as they once were), they continue to capture the vast majority of HPC workloads, even as we see the rapid migration to cloud of most other enterprise software.
The reason is HPC functionality isn’t attainable just by provisioning larger clusters of virtual machines—there comes a point where throwing more processing cores at a job yields diminishing returns. To effectively scale massively parallel clusters requires unique networking capabilities between cloud servers.
In building a second-generation cloud to tackle that challenge, Oracle started with bare-metal machines outfitted with the latest chips for enabling HPC workloads.
The next step was to reduce latency and increase throughput by deploying InfiniBand, an advanced networking protocol, within its cloud regions across the globe. Oracle then supercharged those lightning-fast networks with Remote Direct Memory Access (RDMA) over Converged Ethernet (RoceV2), a technology allowing servers to directly access each other’s memory by bypassing the operating system.
OCI was also the first major cloud to implement “off-box” virtualization, pulling network and IO virtualization out of the software stack and onto the network. Dedicated physical hosts are connected by a full software-defined layer 3 network topology without a hypervisor running overhead, meaning no shared resources and noisy neighbors.
By removing the hypervisor layer, Oracle delivers not only superior network performance, but also better security through isolation.
Those innovations have made it not only practical for scientists, engineers, and startups to run HPC jobs whenever they need to, but also have allowed them to do so at a better price.
Oracle for Startups participants taking advantage of HPC on OCI report 40 to 70 percent benefits in price-performance. And what those startups are doing is amazing.
Skin Analytics, a partner of the NHS, needed cost-effective access to HPC systems to build the cutting-edge AI it deploys to detect skin cancer. With those cloud resources from Oracle, the company has developed advanced image processing techniques to map, analyze and manage skin lesions.
Another startup in the Oracle program, DeepZen, is helping bring audio books and other text-to-voice products to the masses. The company needed a flexible HPC platform on which neural networks and natural language processing could train “cloned” voices to articulate emotions and expressions.
GridMarkets, Kinetica, Skin Analytics and DeepZen are just the vanguard of startups poised to set off a wave of innovation—now that they can readily access what had previously been one of the world’s most-restricted computing resources.