Bridging the Gap between Data Science and Product Development with Streamlit

Author's avatar Product Development UPDATED Apr 11, 2024 PUBLISHED Dec 13, 2023 13 minutes read

Table of contents

    At Databox, scientific research and product development are important fundamental pillars of our innovative ecosystem. Our scientific endeavors fuel groundbreaking discoveries, while our product team ensures these discoveries are converted into practical solutions tailored to meet our customers’ needs. Yet, there’s often a clear gap between these crucial areas, making it challenging to smoothly apply scientific discoveries in real-world scenarios. In our pursuit of a practical solution, we discovered a powerful tool – Streamlit. This blog article explains how we used it to seamlessly connect the realms of scientific research and product development.

    Prototyping Data Science Applications with Streamlit 

    Streamlit is an open-source framework designed to streamline the creation of interactive web applications specifically for data science and machine learning. Its primary objective is to simplify the process of constructing and disseminating data-centric applications, allowing scientists and engineers to effortlessly display their work, which encourages a culture of collaboration and shared knowledge. Introduced approximately five years ago, Streamlit has attracted a dedicated audience within data science and machine learning teams at notable organizations such as Stitch Fix, Uber, Yelp, and Google X. In 2022, it was acquired by a cloud data warehouse giant Snowflake. 

    With its simplicity, it makes the development of data analysis tools, visualization techniques, and machine learning models considerably more accessible. Its true value lies in the ability to transform complex scientific concepts into tangible applications, thereby fostering an environment that encourages innovative breakthroughs.

    At Databox, we found that Streamlit was the essential solution for effectively conveying the intricate findings and insights curated by our Data Science team to our Product team. Besides presenting data-driven revelations, it also helped cultivate a collaborative environment where the nuances of data science seamlessly integrate into the broader spectrum of our product strategy. This synergy resulted in a more agile and informed decision-making process, empowering our product team to leverage the full potential of data science in crafting innovative solutions for our users. 

    Crafting an Interactive Application

    In the realm of data science, the ability to visualize data is critical for interpreting complex patterns, identifying emerging trends, and understanding sophisticated relationships within large datasets. Numerous studies indicate that nearly 65% of individuals process information more effectively when it is presented visually. 

    In the Data Science Team, we previously accomplished this using Jupyter Notebook, a logical selection given our reliance on Python for Data Science projects. While Jupyter Notebook marked a crucial stride in visualizing our results, generating satisfactory outcomes still demanded considerable effort. To streamline this process, we sought a tool capable of creating interactive visualizations and presenting complex data in a format that is both accessible and engaging. Amidst various options, we ultimately chose Streamlit. The most important aspects that we compared the two and influenced our decision are collected in the table below.

    One of the key factors was its straightforward syntax, simple layout and visual components. These features allow us to shift our focus toward creating robust backend algorithms and procedures. In other words, developers can spend less time worrying about the aesthetics and presentation of results and more time on what truly matters – the data and its analysis. It’s important to note that while visualizations offered may not offer full customization, the tool should be seen as an ally in data visualization, albeit not necessarily the be-all and end-all solution for creating the perfect user interface.

    Jupyter Notebook vs Streamlit

    Streamlit Components and Their Implications

    Streamlit’s syntax is straightforward, and the positioning of components within a layout is simplified, freeing a developer from diving into visualization complexities. Most elements can be displayed with just a single line of code, making it incredibly user-friendly.

    A range of basic interactive input elements, such as sliders, input fields, and checkboxes, provide a way to interact with complex algorithms, creating a more engaging user experience. Furthermore, it provides a variety of ways to display data — from plain text and tables to interactive grids and a multitude of charting options, catering to your visualization palette.

    In addition, the display of multimedia elements, such as images, videos, and other forms of media, is as easy as any other element, making the visualizations more dynamic and compelling. Chart elements sync harmoniously with Python libraries like Matplotlib, Pyplot, and Altair, offering a rich spectrum of choices.

    For those who like to venture beyond the conventional, Streamlit is flexible enough to allow the development of custom components. If the existing options do not meet your needs, you have the freedom to create your own, thereby tailoring your data visualization application to your exact specifications.

    A User-Friendly Showcase

    To witness the simplicity of the package, here’s a brief snippet of code in Streamlit.

    import streamlit as st
    import pandas as pd
    import matplotlib.pyplot as plt
    df = pd.DataFrame()  
    df = pd.read_csv('./data/sample_data.csv')
    col1, col2 = st.columns([1,1])
    col1.line_chart(df, y='y')
    nr_bins = col2.slider('Number of bars', 0, 30, 10)
    fig, ax = plt.subplots()
    ax.hist(df.y, bins=nr_bins, rwidth=0.9)

    In just a few lines of code, you effortlessly read a CSV file, present its data as a dynamic line chart, and effortlessly craft an interactive histogram with a user-defined number of bins:

    Data Analysis Prototype

    Integrating Streamlit in Data Science Projects

    Leveraging the power of the new tool, we have managed to introduce a novel step in our feature development process. This step serves as a bridge between the research and development initiatives undertaken by our Data Science team and the subsequent integration of these initiatives into our product by the Product team. This new process has added an extra layer of robustness and efficiency to our operations, ensuring our product development is more streamlined and effective.

    In the subsequent section, we will present two examples that have been game-changing for our operations. The purpose of sharing these examples is not just to highlight our accomplishments but also to inspire other teams and individuals who may be facing similar challenges to explore the potential of Streamlit in simplifying their data applications and fostering innovative solutions. We strongly believe in the power of shared knowledge and hope our experiences can provide valuable insights to others in the industry.

    Prototyping our Forecasting Feature

    Delving into the vast expanse of large time series data sets naturally evokes a demand for forecasting future trends. While predicting the future with absolute certainty remains an elusive goal, our world is saturated with diverse data forecasting methodologies. For our customers, the prospect of insights into future trends of their performance through forecasting is an invaluable asset.

    Databox’ Data Science team was tasked with the mission to explore existing forecasting methods and identify those that would best suit the scenario at hand. Upon the successful completion of our research, we developed a forecasting service, ready to be utilized by the Product team to develop a new forecasting feature that calculates highly accurate predictions of our users’ most important metrics. We compiled the findings of our investigation into a comprehensive report and shared it with the Product team.

    This allowed the Product team to integrate the feature Metric Forecasts into Databox Analytics and thus answer the customers’ requests for future insights into their performance. They primarily used the default parameter values recommended by the Data Science team and demonstrated the results. This feature gained positive traction among our customers and was well-received.

    However, the introduction of the Metric Forecasts feature swiftly sparked a flurry of questions from our customers. The Product and Customer Support teams found themselves unable to provide prompt responses. We soon identified a discrepancy in the perception of the feature and its parameters. It became evident that hypothetical discussions about how varying parameters could impact the forecast outcome were inadequate. The few images and graphs in the report provided insufficient information, which kindled our first exploration of Streamlit.

    The estimated efforts to demonstrate the forecasting feature and facilitate interaction with its parameters were surprisingly favorable. The intuitive design and comprehensive documentation enabled the swift development of a feature visualization, aligning well with our timeline. An even more delightful revelation was the enhanced level of conversation that this process fostered between the Data Science and Product teams.

    The capability to interactively modify parameters and present the outcomes to the Product team significantly amplified their comprehension of the feature, clarifying numerous previously unanswered questions. By employing Streamlit and offering a deeper understanding of the feature’s workings, we also discovered additional opportunities to enhance our customer service.

    The unanimous consensus was that this approach should be adopted for the development of most features by the Data Science team in the future. The immensely positive experience with Streamlit has paved the way for its continued use in our future endeavors.

    Optimizing our Data analysis

    An additional instance to showcase the work conducted by our Data Science team effectively was during a request for in-depth data analysis of our customers. This analysis potentially involved the application of an advanced clustering algorithm.

    Upon initiating this request, the team was presented with a spreadsheet document serving as an example. This document comprised an analysis of data from about 3500 active customers, depicted through several charts, pivot tables, and various other visualizations. 

    By directly accessing data in the source database for use in the application, the scope and scale of the data analysis were significantly amplified. This direct access not only enabled the analysis of larger data sets, which included parameters such as inactive customers and data from diverse sources, but also supported the implementation of many visualizations as needed by the user, along with capabilities for filtering and grouping. To further enhance the request, a clustering algorithm was integrated for customer segmentation, and the results were visualized. The addition of tooltips on the 2D clustering chart greatly simplified the explanation of the results even to users who were more familiar with spreadsheet formats.

    During the analysis, there were suggestions for further extensions and visualization needs. Fortunately, due to the simplicity and flexibility of the tool, finding extra time to implement these extensions was not a challenge. In contrast, using our traditional approach (e.g. plotting through Jupyter notebooks) would have likely consumed significantly more time. As a bonus, the fact that the application operates as an internally deployed service for our company was a considerable advantage. It eliminated the need for users to possess any technical background or setup on their computers.

    The ability to work with filters, visualize results, and analyze the outcomes of customer clustering provided us with a fresh perspective on the structure of our customer base. This led to improved positioning and evaluation strategies. As an added benefit, it revealed insightful information about the hidden characteristics of our customers.

    Streamlit’s Role in Shaping Tomorrow’s Applications

    The positive reception of the new approach has prompted us to rethink the role of prototyping in our upcoming applications. While we acknowledge the necessity of incorporating Streamlit into our development process before deploying a feature to product development, we are exploring the idea of leveraging it for more substantial prototyping — creating rapid prototype solutions for features before initiating the full implementation.With this objective in mind, we intend to adhere to the widely recognized CRISP-DM model for data science, with slight modifications tailored to our specific requirements, as illustrated in the diagram below. Depending on the complexity of the problem, we may incorporate an additional step of prototyping. The key distinction between prototyping and development lies in prototyping’s capacity to quickly estimate the best fit (model, package, dataset, etc.) for the given problem, while development entails high-quality coding with a focus on details. During the evaluation phase, Streamlit can once again provide a straightforward means of manipulating parameters for evaluating the best coefficients.

    Feature development process

    This approach aims to expedite the testing of various libraries and market solutions, enabling quicker decision-making and more effective project planning. Simultaneously, having a prototype at our disposal is expected to facilitate swifter cross-team communication and foster productive discussions regarding the potential impact of the solution on our product.

    Embracing Streamlit for Visual Clarity and User-Friendly Functionality

    One of the key challenges we encountered was effectively presenting scientific results, a task that proved more demanding than initially anticipated. Without an in-depth understanding of the specific scientific domain, conveying development outcomes could easily result in misinterpretation or a lack of comprehension regarding the underlying features.

    The usage of visual tools significantly simplifies the communication of boundaries, as well as the explanation of the ‘why’ and ‘how’, all without imposing a substantial burden on the data science team. The tool’s visual appeal and interactivity offer an exploratory platform for anyone interested in comprehending the factors that may impact algorithmic outcomes.

    While there are various alternatives in the market, Streamlit’s perceived rigidity and suboptimal functionality have not proven to be impediments to use cases so far at Databox. In fact, it’s the tool’s user-friendly simplicity and straightforward application that have been the decisive factors in our choice. However, it’s worth exploring alternatives like Gradio or Anvil if Streamlit doesn’t adequately meet your specific needs.

    A new era of prototyping 

    The successful integration of Streamlit into the routine operations of the Data Science team at Databox has transformed the dynamics of collaboration between the Data Science and the Product teams, ushering in a new era of synergistic teamwork.

    The demonstration and practical application of our data science models have transformed into a seamless, user-friendly, and highly collaborative experience with the help of Streamlit. It quickly became a game-changer, smoothly connecting scientific research and product development. Its user-friendly design allows scientists and engineers to showcase their advanced work, use complex machine-learning models, and create interactive data visualizations with ease.

    It smoothes the transition from academic research to practical product development, fostering an environment conducive to cutting-edge innovation and creativity. This seamless transition serves to expedite the entire process, enabling teams to work more efficiently and to deliver results faster.

    What makes it unique is its ability to make our data science accessible to a broad team, allowing them to manage it effectively. It enables us to concentrate on essential aspects of data analysis and model implementation without getting bogged down in the intricacies of coding advanced visualizations

    If you are in the realm of data science, we encourage you to delve into the world of Streamlit. Explore its comprehensive features, engage with its intuitive interface, and experience firsthand the immense value it can add to your scientific and product development endeavors. Embark on this journey and unlock its vast potential in bridging the gap between the worlds of science and product development.

    Unraveling the dynamic integration of Streamlit into our workflow is part of a series of technical articles, that offer a look into the inner workings of our technology, architecture, and product & engineering processes. The authors of these articles are our product or engineering leaders, architects, and other senior members of our team who are sharing their thoughts, ideas, challenges, or other innovative approaches we’ve taken to constantly deliver more value to our customers through our products.

    Katja Tič, PhD, is a Data Scientist within the Data Science team, playing a pivotal role as a bridge between product development and data science. Her responsibilities include identifying areas that require a more scientific approach for effective solutions, all while devising strategies to present advanced features in a way that resonates with customers, showcasing their added value.

    Stay tuned for a stream of technical insights and cutting-edge thoughts as we continue to enhance our products through the power of data and AI.

    Author's avatar
    Article by
    Katja Tic

    Data Scientist at Databox

    More from this author

    Get practical strategies that drive consistent growth

    Read some