IBM wants to bring machine learning to the mainframe

IBM wants to bring machine learning to its traditional mainframe customers, and eventually to any technology with large data stores hidden behind a company firewall in what IBM calls a “private cloud.”

Yes mainframes, those ginormous computing machines from an earlier age, are still running inside some of the world’s biggest companies including banks, insurance companies, airlines and large retailers. In fact, according to IBM, a modern IBM z Systems mainframe is capable of processing up to 2.5 billion transactions per day – the equivalent of roughly 100 Cyber Mondays every day.

IBM wants to bring some core Watson machine learning smarts to its mainframe clients — and eventually to any computing done inside the data center — to allow them to take advantage of all that data in a more modern machine learning context.

“Over 90 percent of the data in the world can’t be Googled. It resides behind firewalls on private clouds. How do we automate intelligence [for these data sources]?,” IBM analytics general manager Rob Thomas postulated.

IBM wants to provide data scientists with the same types of machine learning capabilities in a mainframe environment that they are used to finding in the cloud. The goal is to automate the often monotonous work of creating, testing and deploying analytical models. The solution works with popular open source tools including languages like Scala, Java and Python, and machine learning frameworks like Apache SparkML, TensorFlow and H2O. It’s also designed to work with virtually any data type the customer brings to the table.

What IBM is offering besides integrating the open source tools, the secret sauce if you will, is Cognitive Assist for Data Science from IBM Research. It helps choose the best algorithm for the data by checking it against a list of available algorithms and selecting the one that best meets the data scientist’s needs, based on the model type and how fast he or she needs the results.

The process should get smarter over time as it ingests more data and sees how the algorithms behave against different data sources. “This allows data scientists to build a model and IBM Machine Learning technology will choose the best algorithm. It then builds a feedback loop because as more data comes in, the algorithm gets updated and gets smarter,” he said.

While the earliest forms of what we call artificial intelligence and machine learning were done on mainframes decades ago, Thomas says this set of tools allows companies running mainframes to take advantage of machine learning technologies in a much more cost-effective way, partly because of open source, and partly because of the algorithms IBM has built to do much of the manual work for them.

He also argues that processing this data in place on the mainframes using these tools is much more cost-effective and practical than it would be to move the same data to the cloud.

This capability will be available for mainframe customers later this quarter. IBM plans to bring machine learning to other data sources sitting inside data centers over time.