Imagine a system with 22 x 2u servers in a 48u rack -all cranking on 176 NVIDIA Tesla K40 GPU chips providing an astonishing 250 Teraflops per rack. We’re talking scream machines and that’s what Cray is delivering in its latest high-performance system called the Cray CS-Storm.
Consider that a four cabinet Cray CS-Storm system is capable of delivering more than one petaflop of peak performance. That’s a mighty powerful system
And it comes at a time when you can get inexpensive computing power with a credit card from infrastructure providers like AWS or cheap machines from white box PC makers. Cray doesn’t try to compete there though, sticking to the high performance computing market as it always has, where companies who make use of these systems, need full-on, pedal to the metal power 24×7/365. And that’s what Cray provides says Barry Bolding, vice president of marketing and business development at Cray.
He points out these machines are not for the faint of heart. In order to justify the cost of one of these babies, which he points out could run you close to a million dollars fully loaded, you need to have specialized needs and you need to run them full bore, pretty much all day long every day. When he says full on, he means it.
“The NVIDIA [Tesla] K40s run at 300 watts when doing maximum calculations,” Bolding explained. And the CS-Storm has been engineered to run at a maximum of 300 watts all the time, he told me. That means it delivers full power all day long.
To give you a sense of where the cost comes from, a single NVIDIA Tesla K40 chip costs around $4,000. As he said, do the math. There are up to 8 per node and a total capacity of 176 in one rack. If you have four fully loaded racks, that’s over 700 chips. We’re talking some serious power for serious cash.
Bolding said the system hardware is built on the air-cooled Cray CS300 system, while the software includes the Cray Advanced Cluster Engine cluster management software for resource planning and scheduling plus the complete Cray Programming Environment for building applications on top of the Cray platform, and they also have an integration layer to work with other software. What’s more, customers don’t have to get the fully configured super-duper system if they don’t want it. They can save money by buying just five or 10 servers at whatever configuration meets their needs and budget.
Bolding says the systems have been designed for highly specialized types of jobs that require intense computing such as planet-wide weather forecasting, seismic measuring for oil exploration, financial analysis or crash simulations. All of these require a steady diet of power with thousands of simultaneous calculations that you just can’t break down into a single job. And the intensity is on-going, not just for a short burst.
That’s why he says the cloud just isn’t cost-effective in these instances. While they’re good for that short burst or the short-term capacity boost or even for steady scaling, the super computer comes into play when you need constant, intense computing power. Bolding explained that Cray uses the highest quality components designed to run at maximum capacity virtually all of the time without breaking
He claims companies that need this kind of power can justify the cost because they aren’t going to get the efficiency they achieve with Cray’s engineering with other solutions. He boasts they use the highest quality parts, with the highest quality signal quality so that these components are all working together to the maximum capacity the hardware allows.
If you have the need, you can you justify the cost and get to work. If not, you can sit back and drool and dream your geek dreams like the rest of us.