For data scientists, the notebook has become an indispensable tool. Jupyter Notebooks, in particular, have reigned supreme for their interactive nature, combining code, visualizations, and narrative in a single document. But as projects grow in complexity, cracks in the traditional workflow can appear. I found out about a solution while listening to the “Super Data Science Podcast,” where guest Dr. Akshay Agrawal, co-founder and CEO of Marimo, discussed these very frustrations. Enter Marimo, a reactive Python notebook that’s reimagining the data science experience and offering some compelling advantages over its older sibling.
If you’re finding yourself wrestling with stale outputs, confusing execution order, or the challenges of sharing and reproducing your notebook results, it might be time to explore why Marimo could be the breath of fresh air your workflow needs.
The Reactive Revolution: Outputs That Keep Up
One of Marimo’s most significant departures from Jupyter is its reactive execution model. Forget manually re-running cells in the correct order after making a change. To understand the frustration this eliminates, let’s look at a common scenario in Jupyter.
In a Jupyter Notebook, cells can be executed in any order, and this can lead to incorrect or confusing results, especially when a cell depends on a variable or state from a cell that was run at a different time. This can create a hidden, inconsistent state.
The Problem in Jupyter
Imagine you have a notebook to calculate a final price with a tax rate. The notebook has three cells: one for the base price, one for the tax rate, and one for the final calculation.
Cell 1: Define the base price.
Python
price = 100
Cell 2: Calculate the final price.
Python
final_price = price * (1 + tax_rate) print(final_price)
Cell 3: Define the tax rate.
Python
tax_rate = 0.08 # 8%
What Goes Wrong
Here’s the scenario that creates the error:
- Initial Run: You run the cells in the correct order: Cell 1, then Cell 3, then Cell 2. The output is 108.0, which is correct.
- The Change: Later, you decide to change the base price in Cell 1 from ‘100’ to ‘200’.
- The Mistake: Instead of re-running all the cells in order, you only run Cell 1 again and then re-run Cell 2.
- The Error: Because you didn’t re-run Cell 3, the ‘tax_rate’ variable is still holding its original value of ‘0.08’. The new “price” of ‘200’ is used with the old “tax_rate”, and the output is ‘216.0’. This is the correct output for the code you ran, but it’s the wrong answer for what you intended.
The real problem arises when you want to change the tax rate to ‘0.05’ but forget to run that cell before the final calculation. Now your final price is based on the old tax rate, and you get an incorrect result without an obvious error message.
Why Marimo Avoids This
In a reactive notebook like Marimo, the outcome would be different. If you changed the “price” in Cell 1, Marimo’s reactive engine would automatically detect that Cell 2 depends on “price”, and it would re-execute Cell 2 for you, updating the “final_price” immediately and correctly. This eliminates the possibility of out-of-order execution leading to an inconsistent state.
Think about it: you tweak a parameter in your data cleaning step, and instantly, all your subsequent analysis and visualizations update. This eliminates the frustrating and often error-prone task of manually tracking dependencies and ensuring your notebook is in a consistent state. It’s a game-changer for exploratory data analysis, allowing you to iterate and see the impact of your changes in real-time.
Advantages of a reactive notebook:
- Automatic state consistency: No more stale outputs or the dreaded “run all cells” scramble.
- Faster iteration: See the impact of your code changes instantly across your entire notebook.
- Reduced cognitive load: You can focus on your analysis rather than managing execution order.
Pythonic from the Ground Up: A More Natural Workflow
While Jupyter has its strengths, Marimo feels more inherently Pythonic. It leverages standard Python syntax and doesn’t rely on “magic commands” to the same extent, which leads to a more intuitive and less fragmented coding experience. A great example is how it handles plot outputs, which in Jupyter requires a special “magic command.”
Jupyter Notebook Example
In a Jupyter Notebook, if you want to create a plot using a library like Matplotlib and have it appear directly below the code cell instead of in a separate window, you need to use a “magic command.”
Here is how you would do it in Jupyter:
Python
%matplotlib inline import matplotlib.pyplot as plt import numpy as np x = np.linspace(0, 10, 100) y = np.sin(x) plt.plot(x, y) plt.show()
The “%matplotlib inline” command is not standard Python. It’s a special directive interpreted by the IPython kernel that tells it how to handle the output of the plotting library. Without this line, the plot might not display at all or could appear in a new window, which breaks the flow of the notebook.
Marimo Example
In Marimo, this is handled implicitly. Since Marimo is built to be a more Pythonic and reactive environment, it understands how to display the output of a standard plotting command without needing a magic command. You would simply write the standard Python code for your plot.
Here is the equivalent code in a Marimo notebook:
Python
import matplotlib.pyplot as plt import numpy as np x = np.linspace(0, 10, 100) y = np.sin(x) plt.plot(x, y) plt.show()
Notice that the code is identical to a standard Python script. Marimo’s environment is designed to handle this output automatically, leading to a more seamless and less fragmented coding experience that feels more like writing and running a pure Python file.
This commitment to standard Python syntax makes the coding experience feel much more natural.
The Power of Functions: A More Pythonic Workflow
Beyond its reactivity, Marimo also treats functions and variables as first-class citizens, making it easier to organize your logic and build more robust analyses. This structured, function-based approach is a key philosophical difference from other platforms. It’s a design choice that naturally encourages the kind of modular, maintainable code that is a hallmark of good software development.
The Problem in Jupyter
In a traditional Jupyter Notebook, it’s common for code to be written in a procedural style, spread across multiple cells. This can make the notebook difficult to read, debug, and reuse.
Jupyter Notebook Cells:
Cell 1: Data Loading
Python
import pandas as pd
import numpy as np
data = {
'city': ['New York', 'New York', 'New York', 'Boston', 'Boston'],
'temperature_c': [10, 12, 11, 8, 9],
'is_daylight': [True, False, True, True, False]
}
df = pd.DataFrame(data)
Cell 2: Data Cleaning
Python
# Convert temperature to Fahrenheit
df['temperature_f'] = (df['temperature_c'] * 9/5) + 32
# Drop the old column
df = df.drop('temperature_c', axis=1)
# A print statement to check the result
print("After cleaning:")
print(df.head())
Cell 3: Visualization
Python
import matplotlib.pyplot as plt
# A plot of the temperatures
plt.figure(figsize=(8, 4))
plt.bar(df['city'], df['temperature_f'])
plt.title('Average Temperature by City (Fahrenheit)')
plt.ylabel('Temperature (F)')
plt.xlabel('City')
plt.show()
This style works, but it has problems:
- The logic for cleaning and visualizing is intertwined with the data.
- If you wanted to reuse the cleaning logic for a new dataset, you’d have to copy and paste the code from Cell 2.
- The “df” variable is being modified in place, making it harder to track its state.
- If you make a mistake in Cell 2, you have to remember to run Cell 3 again, or your plot will be based on the old, uncleaned data.
The Solution in Marimo: Modular & Functional Code
In Marimo, because of its reactive nature, you are encouraged to define functions to encapsulate your logic. When you define a function in one cell, it becomes a “first-class citizen” that is instantly available to all other cells. When you change the function, any cell that calls it automatically re-executes.
Marimo Notebook Cells:
Cell 1: Data Loading & Helper Functions
Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def load_data():
data = {
'city': ['New York', 'New York', 'New York', 'Boston', 'Boston'],
'temperature_c': [10, 12, 11, 8, 9],
'is_daylight': [True, False, True, True, False]
}
return pd.DataFrame(data)
def clean_data(df):
df_cleaned = df.copy() # Best practice: work on a copy
df_cleaned['temperature_f'] = (df_cleaned['temperature_c'] * 9/5) + 32
df_cleaned = df_cleaned.drop('temperature_c', axis=1)
return df_cleaned
Cell 2: Main Analysis
Python
# Load the raw data df_raw = load_data() # Clean the data using the function from Cell 1 df_processed = clean_data(df_raw) # Show the results df_processed.head()
Cell 3: Visualization
Python
plt.figure(figsize=(8, 4))
plt.bar(df_processed['city'], df_processed['temperature_f'])
plt.title('Average Temperature by City (Fahrenheit)')
plt.ylabel('Temperature (F)')
plt.xlabel('City')
plt.show()
The Marimo Advantage:
- Encourages Functions: The reactive nature of Marimo makes it easy and safe to use functions. If you update the “clean_data” function, Marimo automatically detects that “df_processed” depends on it and re-runs both Cell 2 and Cell 3, ensuring your visualization is always up-to-date.
- Separation of Concerns: The notebook is organized into logical units: data loading/cleaning logic is in one cell, the application of that logic is in another, and the visualization is in a third.
- Reusability: The functions “load_data” and “clean_data” are now clearly defined and easily reusable in other parts of the notebook or even copied to other scripts.
- Immutability and Traceability: By returning new DataFrames (“df_cleaned”), the functions avoid modifying global variables, making the data flow clear and easy to follow.
This functional style, which is standard practice in Python programming, is a natural fit for Marimo and leads to more robust, readable, and reusable analyses. Here are some advantages of this approach:
- More natural Python experience: Less reliance on magic commands and a more integrated workflow.
- Encourages modular code: Easier to structure and maintain larger analyses.
- Potentially lower learning curve for pure Python developers.
Designed for Collaboration and Sharing: Reproducibility Built-In
Beyond its execution model, Marimo also streamlines collaboration. Another compelling strength of Marimo is its focus on reproducibility and collaboration. By saving the notebook as a clean, version-control-friendly Python file, it makes sharing and deployment far simpler. This inherent structure, combined with its reactive nature, not only ensures your results are consistent but also streamlines the entire process, from exploratory analysis to sharing and deployment. The advantages of this approach include:
- Improved reproducibility: Your notebooks are easier to share and run consistently across different environments.
- Simplified deployment: The notebooks can be easily turned into interactive web applications.
- Enhanced collaboration: It’s easier for others to understand and interact with your work.
Jupyter Notebook (“.ipynb” format)
A Jupyter Notebook is saved as a single JSON file with a “.ipynb” extension. This file contains all the notebook’s data, including:
- The raw code for each cell.
- The output that was generated when the cell was last run (text, images, HTML, etc.).
- Metadata about the notebook and each cell (e.g., cell type, execution count).
Example Code in a Jupyter Notebook:
Cell 1:
Python
x = 10
Cell 2:
Python
print(x * 2)
Saved “.ipynb” File Content (as raw JSON):
JSON
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"x = 10"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"20\n"
]
}
],
"source": [
"print(x * 2)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Observation: The actual Python code (“x = 10”, “print(x * 2)”) is buried within a complex JSON structure. This makes it difficult to read, version control, or run as a standalone script.
Marimo Notebook (“.py” format)
Marimo notebooks are saved as a standard Python file with a “.py” extension. This file contains only the code. Marimo uses standard Python functions and decorators to define the structure of the notebook, but the file itself is a plain Python script.
Here is the file content (as raw Python) of the same example above used in a Jupyter notebook:
Python
import marimo
__generated_with = "0.14.17"
app = marimo.App(width="medium")
@app.cell
def _():
return
@app.cell
def _():
x=10
return (x,)
@app.cell
def _(x):
print(x*2)
return
if __name__ == "__main__":
app.run()
Observation: The file is just standard Python code. It’s clean, human-readable, and can be executed directly as a Python script. The Marimo-specific decorators (@app.cell) provide the necessary structure for the notebook environment but do not interfere with its readability or ability to be version-controlled using tools like Git.
Dependencies, environment issues, and the aforementioned execution order can lead to headaches. Marimo addresses these concerns with a focus on reproducibility:
- Self-Contained Execution: Marimo notebooks store their execution state, making them more self-contained and easier to reproduce.
- Simplified Sharing: Marimo notebooks can be easily shared as standalone Python scripts or deployed as interactive web applications with minimal effort. This makes it simpler for collaborators or stakeholders to interact with your analysis without needing the full Jupyter environment.
- Clear Dependency Tracking: The reactive nature of Marimo inherently makes dependencies clearer and less prone to hidden issues.
The Verdict: A Modern Take on Data Science Notebooks
While Jupyter Notebooks remain a powerful and widely used tool, Marimo offers a compelling alternative that addresses some of the pain points associated with the traditional notebook workflow. Its reactive execution, Python-first approach, and focus on reproducibility make it an attractive option for data scientists looking for a more streamlined and collaborative experience.
Of course, the best tool depends on your specific needs and project. However, if you’re seeking a more modern, reactive, and shareable notebook environment, it’s definitely worth giving Marimo a try. It just might revolutionize the way you approach your next data science project.
Have you experimented with Marimo? Share your thoughts and experiences in the comments below!
