Streamlit launches open-source machine learning application development framework

Streamlit, a new machine learning startup from industry veterans who worked at GoogleX and Zoox, launched today with a $6 million seed investment and a flexible new open-source tool to make it easier for machine learning engineers to create custom applications to interact with the data in their models.

The seed round was led by Gradient Ventures with participation from Bloomberg Beta. A who’s who of solo investors also participated, including Color Genomics co-founder Elad Gil, #Angels founder Jana Messerschmidt, Y Combinator partner Daniel Gross, Docker co-founder Solomon Hykes and Insight Data Science CEO Jake Klamka.

As for the product, Streamlit co-founder Adrien Treuille says as machine learning engineers, he and his co-founders were in a unique position to understand the needs of engineers and build a tool to meet their requirements. Rather than building a one-size-fits-all tool, the key was developing a solution that was flexible enough to serve multiple requirements, depending on the nature of the data with which the person is working.

“I think that Streamlit actually has, I would say, a unique position in this market. While most companies are basically trying to systemize some part of the machine learning workflow, we’re giving engineers these sort of Lego blocks to build whatever they want,” Treuille explained.

self driving 1

Customized self-driving car data application built with Streamlit that enables machine learning engineers to interact with the data

Treuille says that highly trained machine learning engineers that have a unique set of skills actually end up spending an inordinate amount of their time building tools to understand the vast amounts of data they have. Streamlit is trying to help them build these tools faster using the kind of programming tools with which they are used to working.

He says that with a few lines of code, a machine learning engineer can very quickly begin building tools to understand the data and help them interact with it in whichever way makes sense based on the type of data. That may mean building a set of sliders with different variables to interact with the data, or simply creating tables with subsets of data that make sense to the engineer.

Treuille says that this toolset has the potential to dramatically transform the way machine learning engineers work with the data in their models. “As people who are machine learning engineers and have seen this and know what it’s like to go through these challenges, it was really exciting for us to say, there’s a better way of doing this and not just a little bit better, but something that will turn a project that would have taken four weeks and 15,000 lines of code into something that you can do in an afternoon.”

The toolkit is available on GitHub for download starting today.