Streamlit wants to be for data science what business intelligence tools have been for databases: A quick way to get to results, without bothering much with the details
We were confused at first when we got the news. We interpreted “application framework for machine learning and data science” to mean some new framework for working with data, such as PyTorch, DeepLearning4j, and Neuton, to name just a few among many others out there.
So, our first reaction was: Another one, how is it different? Truth is, Streamlit is not a framework for working with data per se. Rather, it is a framework for building data-driven applications. That makes it different to boot with, and there’s more.
Streamlit is aimed at people who don’t necessarily know or care much about application development: Data scientists. It was created by a rock star team of data scientists who met in 2013 while working at Google X, it’s open source, and has been spreading like wildfire, counting some 200.000 applications built since late 2019.
Today Streamlit announced that it has secured $21 million in Series A funding. ZDNet connected with CEO Adrien Treuille to discuss what makes Streamlit special, and where it, and data-driven applications at large, are going next.
To listen to the conversation with Treuille in its entirety, you can head to the Orchestrate All the Things podcast.
From zero to hero: from datasets and models to applications
The investment was co-led by Gradient Ventures and GGV Capital, with additional participation from Bloomberg Beta, Elad Gil, Daniel Gross, and others. Glenn Solomon, a managing partner at GGV Capital, said that:
“Adapting quickly to new information and insights is one of the biggest challenges facing companies today. Streamlit is leading the way in helping data science teams accelerate time to market and amplify the work of machine learning throughout companies of all sizes across a wide variety of industries. At GGV we’re very excited to back this exceptional founding team and support their ambitious global growth plans.”
Let’s take it from the start then. In Treuille’s words, he and his co-founders came to be entrepreneurs via academia, doing machine learning and big data and AI before they were called by these names, and certainly before they were cool. Through his stints at Google X and Zoox AI teams, Treuille observed a pattern.
The promise of machine learning and artificial intelligence was often sequestered in those groups, and not influencing the organization as well or as easily as they could. That led Treuille to start working on a pet project to solve this. Eventually, it started getting used by a number of engineers and growing really quickly. Then investment came, and a big open-source launch.
Streamlit is working on enabling data scientists to develop data driven applications in a fraction of the time it usually normally takes
metamorworks, Getty Images/iStockphoto
Streamlit grew from a one-man project to being used in a number of Fortune 500 companies, and beyond, under the radar, until today. And it worked that way for a number of reasons.
First, Treuille and his co-founders leveraged their network. Second, they open-sourced Streamlit, which made it easy for everyone to adopt and experiment with. Third, and perhaps more important, they captured what Treuille called the Zeitgeist: They offered a solution to a problem data scientists, and the organizations employing them, are facing:
How to go from fiddling with datasets and models, to deploying an application using them in production. In essence, to do this, a number of people have to work together. At the very least, data scientists and application developers. As usual in situations like these, skills and culture differ, and collaborating costs time and money.
Streamlit cites Delta Dental as an example. They were told that using AI to analyze their call center traffic would cost a hefty amount and take a year. A data scientist at Delta Dental used Streamlit instead, and he built an MVP in a week, a prototype in three weeks, and had it in production in three months, all at zero cost to the company, says Streamlit.
Taking the application developers out of application development
To understand how this is possible, we need to dive deeper into how Streamlit works. Streamlit tries to take the application development team out of the picture, by enabling data scientists to develop their own applications.
Treuille elaborates on the conundrum of getting data scientists to build applications, or getting application builders to work with data scientists. Data scientists do not necessarily have the core skills for application building, and their applications end up being un-maintainable. Application builders move on to other applications, resulting in freezing features.
What Streamlit does is it lets data scientists create applications as a byproduct of their workflow. It takes their Python scripts and turns them into applications, by letting them insert a few lines of code that abstract application constructs such as widgets.
That’s unorthodox. Software engineers would argue there’s a reason why web development frameworks exist, for example. And there are many years of experience and best practices distilled into them. To throw them all away in favor of annotated Python scripts would look like bad practice, not to mention, an existential threat.
Treuille begs to differ. To support his view, besides widespread adoption, he argues that this is a different way of developing applications. The applications are different, the scope is different, and Streamlit does not intend to reinvent the application development wheel, but rather, to integrate it:
“We view ourselves as a translation layer between the Python world and the web framework world. For example, everything in Streamlit is written in React. When you’ve discovered the joy of React, that’s like programming nirvana. We can take almost anything in the React ecosystem, and translate that into Streamlit almost effortlessly. So our core technology is really that translation layer.”
From a Python script to a web application, with a few extra lines of code. Image: Streamlit
Treuille went on to add that soon Streamlit will enable any developer to translate any bit of web tech into a single Python function, thus allowing the two ecosystems to flourish independently of one another. The same approach is taken also with regards to using other Python frameworks such as Dask or Ray, for example:
“Streamlit is very modest, in some ways, very small. And therefore we sit alongside whatever — the whole Python data ecosystem. And that is really exciting because of the bigger story here, which is way, way bigger than Streamlit. It’s the data world which was at one time big databases, and then it was Excel, and then it was Tableau, and more recently Looker.
This tsunami is coming, which is open source and machine learning, and Python, and Pandas, SciKit learn. This is basically 20 years of academic research into machine learning, crashing into the data world, and completely transforming it. And we view ourselves as just a little surfboard in that wave, just riding it, or trying to ride it as best we can.”
There’s an app for that. Should you build it with Streamlit?
That may explain the approach, but not the scope. There is more to applications than data and data-driven features. If you are Netflix, for example, the core business revolves around streaming, and the applications should reflect that. They should enable people to manage payments, stream films, and so on.
Recommendations add to that, powered by data and machine learning. But they are not the core business. Treuille acknowledged that Streamlit does not aspire to be the front end to your entire company: “If Netflix came to us and said, hey, we want to write the Netflix app website in Streamlit, we’d say we don’t think that’s a good idea.”
Streamlit is not a general-purpose application development framework. What it does, in a way, is the same thing that business intelligence application frameworks did for databases. It provides a framework that enables quick access to the underlying source of value. For BI frameworks, it was data stored in databases. For Streamlit, it’s machine learning models.
We would still question how many data scientists, or their managers for that matter, would be happy with adding the task of maintaining their Streamlit applications on top of everything else they already do. We would also question whether application developers can, or should, be taken out of the picture entirely, even for purpose-built, data-driven applications, as they grow over time. But Streamlit is early in its lifecycle to be able to answer those questions.
Data scientists are not necessarily the most suitable people to develop applications. Image: Streamlit
That, however, does not seem to have stopped users or investors. Speaking of which, there’s another interesting question here. What is Streamlit’s business model, and how did it get to convince people to invest money in it? In a nutshell: Software as a service in the cloud, with a tweak.
You can use Streamlit to develop any application without any restrictions. What you pay for, optionally, is deployment. Users can deploy Streamlit anywhere they please, on their own. But Streamlit offers its own cloud solution, called Streamlit for Teams, which comes with additional features around collaboration and deployment.
Treuille was adamant about Streamlit’s bottom-up sales strategy: Just getting the software out there, enabling people to start building applications, and then converting a part of them to paying users.
The bigger picture: Software 2.0
Streamlit is interesting if nothing else then because of the different paradigm it brings to application development. Which in turn, is part of what Treuille sees as a different way of building applications:
“The bigger picture is the way that the Python ecosystem and the community of open source developers and academic developers and corporations — TensorFlow is built by Google, PyTorch by Facebook — how all of these different forces have come together to create this incredibly powerful data ecosystem. That truly can revolutionize the show. That truly has different properties than just a simple spreadsheet and a list of your sales over the past year.”
Some people refer to this as Software 2.0. What we wondered, however, was whether the world is really ready for this. In many ways, most organizations probably have not gotten Software 1.0 right yet. Version control, release management, software development tools, and processes — these are not exactly trivial things.
Now add to that — dataset management, provenance, machine learning, and feature engineering, versioning, to name but a few of the concerns of data-driven development, and what you get is a combinatorial explosion. Treuille conceded that is really part of the Zeitgeist over the past couple years.
Treuille sees Streamlit as being part of a wave of new startups such as Tecton or Weights and Biases, which are essentially productionizing every layer of that stack. He believes talented people are working on this, and it’s coming into view. His take on how to get with the program:
“If you are a company, asking yourself how to get into this world, what is even the first step, I would say: Go to Insight Data Science. Hire one of their machine learning engineers or data scientists finishing the school for data scientists, and then give them Streamlit.”