testcell: run and forget

jupyter
ipython
nbdev
Author

Stefano Giomo

Published

May 19, 2023

Risotto with porcini

You’re cooking a risotto with fresh porcini mushrooms from scratch, chopping carrot and onion for the broth, searing the porcini heads with some garlic, and using pans for the base, spoons to blend and taste, knives for vegetables and parsley, and even a Parmesan grater for the final touch. You’ve made the perfect risotto, but now your kitchen is like a battlefield. What if the kitchen could magically clean itself after you’ve finished eating, restoring everything to its clean and organized state?

Your notebook cell is the kitchen, and the code you want to run is the meal you want to enjoy. testcell is the cell magic that takes care of cleaning up the mess (†) your code spits out and lets you focus on the important part: thinking and iterating.

Furthermore, when you enable the noglobal option (aka: testcelln), it’s akin to rebuilding your whole kitchen from scratch each time you cook a meal. In the real world it seems very dumb, but in software development it’s incredibly valuable for ensuring reproducibility and maintaining complete control over the process.

I know, <<we’re talking about programming with the metaphor of cooking>> (that’s a famous quote from a great talk between Lex Friedman and Guido Van Rossum - minutes 7:50 e 11:58). But in the end code and cooking are all about recipes.

WARNING: It’s important to note that testcell can only clean up the mess created by the code you run in that cell if it doesn’t have any side effect (that’s the meaning of the (†) above). Pay close attention to this warning: if you delete a file, it will not magically reappear; if you modify a record in a database, it will not rollback to its previous state; and if you change an entry in a dictionary in the global namespace, it won’t automatically revert back to its original value.

HOW IT WORKS

Here’s how it works:

# Use pip to install it
!pip install testcell
# Remember to import it
import testcell

Here is testcell in action:

%%testcell
# This is a jupyter or ipython cell
a=1
a

1

The variable a has not been added to notebook’s globals.

'a' in globals().keys()

False

As the name testcell suggests, it is a way to explicitly mark and say “this is a cell containing a test”: it can catch the notebook state, reason about it and clean up the mess. It’s different form scratchpad, an official jupyter nbextension, where you ephemerally write and run code in an interactive terminal outside of your notebook. It’s not the typical exploratory notebook workflow where you write code on a temporary cell, execute it, make plots, review results, and eventually delete the cell.
Unlike these approaches, testcell does not introduce any changes to the global state (aka: no mess). Moreover, %%testcell annotated cells are meant to “stick” and be part of your notebook, not some transient code you run and dispose.

As a closing note, testcell is a thinking tool that helps you to easily try new stuff and write cleaner code.

IT’S ALL ABOUT NOTEBOOKS

If you’re reading this, I assume you’re familiar with Jupyter Notebook. If you enjoy its multimedia literate programming paradigm, I’m sure you’ll appreciate fastai’s nbdev as well. testcell has been developed from scratch using nbdev, covering everything from code to documentation, Continuous Integration, and final deployment on PyPI.

testcell integrates seamlessly with nbdev workflow, helping you to keep your test fixture out of the main scope and visually mark the cells containing tests.

NOTE: the following example has been adapted from the official nbdev tutorial

#| export
def say_hello(to):
    "Say hello to somebody"
    return f'Hello {to}!'

Once you’ve defined your function on a separate cell you can now define some tests using arbitrary complex fixtures:

%%testcell
def create_fixture():
  return ['Hamel','Stefano']

for n in create_fixture():
  test_eq(say_hello(n), f"Hello {n}!")

As you can see the local function create_fixture is not part of the global state:

'create_fixture' in globals().keys()

False

As closing note I want to mention that testcell works not only with Jupyter but with ipython too:

In [1]: import testcell

In [2]: %%testcell
   ...: a=1
   ...: a
Out[2]: 1

In [3]: 'a' in globals().keys()
Out[3]: False

WHY I WANTED IT:

In the last weeks, I’ve used this in my workflow, both to debug it before release and to see how many other needs and use cases could have arised. The following is a list of real-life scenarios I’ve tested it on:

To try new stuff

REPLs (Run-Eval-Print Loops) like Jupyter and IPython are excellent tools for thinking and experimenting: for exploratory programming. One downside is that they often lack a “sandbox” environment, meaning that if you’re not careful, your actions can impact global state. By annotating a cell with testcell (or testcelln if you desire complete isolation - more on this later), you can evaluate code snippets copied from Stack Overflow or provided by your favorite LLM assistant, without the risk of injecting new stuff in your global namespace.

To test and document your code

In the previous cells, you created a new function; in the current cell you want to test it or document some use cases. To prevent the fixture code from affecting the global state, you can annotate the cell with testcell. This ensures that the code made for testing purposes remains isolated. The nbdev workflow we discussed earlier serves as an example of this particular scenario.

To work with huge state

Let’s imagine you’re working on a Kaggle notebook, and you’ve just completed a lengthy training process. Now you want to write a new function that utilizes the clean and precious “state” you’ve obtained (model, variables, etc). I’m sure you’ve saved this state, but adding a bunch of new code directly into the notebook could clutter it with unnecessary items that can potentially consume significant memory. You also want to avoid restarting the notebook to clean the state to save time. The solution is to annotate the cell where you conduct your experiment with testcell. This keeps the notebook clean and organized while leveraging the previous state.
In the following example, after executing the cell and displaying the figure, the large X_tsne variable with shape (100000, 2) and the modules plt and TSNE will not be added to the global scope.

%%testcell
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

# Assuming X.shape=(100000,100)

# Perform t-SNE dimensionality reduction
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(X)

# Create a scatter plot of the reduced data
plt.scatter(X_tsne[:, 0], X_tsne[:, 1])
plt.title('t-SNE')
plt.show()

To avoid forgetting global state references in functions

I’m sure it’s happened to you too: you forget a reference to a global variable in the body of a new function and wonder why the code does not behave like expected. Annotating the cell with %%testcell noglobals or simply %%testcelln would uncover this dependency, raising an error that points out the missing reference as an undefined variable.

These are mine, what about your use cases?

ARCHITECTURE (aka some nitty gritty and nerd stuff)

Under the hood, testcell operates on a simple principle. We can see how it works using the verbose option.
testcell analyzes the cell and encapsulates the code in a temporary function; if the last line is eligible to be shown, it’s result is returned. That temporary function is then executed then deleted and it’s result sent to the standard jupyter interpreter that will decide how to display it.

%%testcell verbose
a = "'a' is not polluting global scope"
a

Will have the following output:

### BEGIN
def _test_cell_():
    #| echo: false
    a = "'a' is not polluting global scope"
    return a # %%testcell
try:
    _ = _test_cell_()
finally:
    del _test_cell_
_ # This will be added to global scope
### END

"'a' is not polluting global scope"

It’s important to notice that testcell has been written in pure Python without any external dependencies. This ensures seamless usage.

The noglobal option (testcelln is a short form of testcell noglobals) executes your cell like if it was a fresh new notebook, providing a kind of notebook-in-notebook experience. Consequently, it becomes harder to accidentally modify the notebook’s state outside of your cell, as you no longer have direct references to it. Moreover, testcell acts as a cell compiler, assisting you in identifying any potential harmful references to the global state that may exist within your new function.

If you wish to delve deeper into the inner workings of testcell, I encourage you to explore the GitHub notebooks located in the nbs folder. Thanks to nbdev, these notebooks serve as both documentation and source code.

Links

THE STORY BEHIND IT:

I embarked on this to publish my first PyPI package and to try the nbdev workflow from start to finish, from initial coding to final deployment. I choose testcell because it seemed to be a simple “toy project” that (silly me!) I anticipated completing within a couple of hours.

The goal of testcell was to execute a Jupyter cell without allowing any variables or functions defined within it to impact the global state of the notebook. Moreover, I wanted that cell to behave like a typical Jupyter cell in all situations. Achieving this last specification proved to be the most challenging aspect of the project!

Initially, I opted for an under-engineered solution just to have a functional prototype and prove its value. I implemented it by extracting the last line of the cell and enclosing it within a display(…) call to support both __str__ and __repr__ cases. This worked, but it was buggy and far from behaving like a normal cell. Eventually, I decided to abandon this naive implementation and switched to a more advanced one that uses standard python ast package (abstract syntax tree). Then I gradually added more options, such as verbose and dryrun, while exploring various use cases. Confident in its capabilities, I deployed version 0.0.1 as an official PyPI package using the nbdev_pypi command. Right after deployment, bugs and edge cases started to appear everywhere like a call-to-arms.

While attempting to resolve a sneaky bug, I stumbled upon the idea of noglobals. This discovery took the project in a new direction, moving it beyond its original toy project goal. With the introduction of this option, testcell can act like a one-cell-compiler that alerts users if they use global state within their functions.

Finally I’ve abandoned the display last line approach in favor of return last line, to properly mimic standard jupyter behavior where it’s the notebook to decide how to present the result instead of forcing a display call.

The downside: what I estimated to take just two hours ended up consuming more than four weekends (and that doesn’t even account for the time spent on this blog post), but the learning experience far outweighs the time investment.

So my advice is the same one I learned through fastai courses years ago: “Take on a problem, especially one that appears to be simple, and solve it end-to-end”.


I did this whole project forcing myself as much as possible to pair program with AI on three main domains: documenting, coding and thinking.

From a documentation standpoint, it assisted me in writing clean and proper English (otherwise, I would have had to constantly bother my native language friends). Despite it sometimes suggests words and constructs that are more formal than what I normally use, it provides a detailed explanation for each sentence modification, sometimes it seems like a personal English teacher.

From a coding perspective, it helped me in different domains by quickly providing a ready-to-use stubs with all the necessary and well-documented boilerplate code. It’s like having an infinite Stack Overflow resource where you can always find examples that precisely match your requirements. It really start you up, despite you still need to choose the right stub (for instance, I specifically requested the use of ast instead of a third-party library) and may need to fine-tune it manually to deepen your understanding and avoid falling into the trap of continuously specifying more and more details until it yields the desired outcome.

From a reasoning standpoint, working with LLM it really feels like having a talking rubber duck that not only listens but also reviews your thoughts, challenging your ideas and offering alternative perspectives.

Long story short: I really felt impressed and empowered by these tools.

CONCLUSIONS

Once again, I seem to have veered off course with this blog post. Originally intended as an introduction to testcell, it has been extended into a reflection on the importance of experimentation and continuous learning, as well as how these new AI tools will shape our future jobs and lives. Finally I would like to thank to Alex, Suvash, Laith and all the friends in the Delft study group who provided invaluable support and feedback on testcell, helping me identify and address various bugs along the way. Lastly, I extend my thanks, as always, to Alexis and Zach for their invaluable help and final touches on this blog post.



RECIPE FOR PORCINI MUSHROOM RISOTTO

Ingredients:

  • Rice: one small coffee cup per person, I suggest using Carnaroli or Vialone Nano rice.
  • Fresh porcini mushrooms: approximately one large mushroom for every two people or one small mushroom per person.
  • Dried porcini mushrooms: a handful, mainly used to flavor the broth.

Preparation:

  1. Soak the dried porcini mushrooms in water for a couple of hours until they are rehydrated. Then, drain them, but keep the soaking water.
  2. Prepare the broth by boiling a carrot and an onion in the soaking water of the dried porcini mushrooms. Taste and adjust the salt: the broth should be salted, not the risotto.
  3. Clean the fresh porcini mushrooms well, removing any dirt, and separate the stems from the caps. Slice the caps into thick slices and sauté them in a pan with olive oil and a clove of garlic. Cut the stems into small pieces and set them aside; we will add them to the risotto during cooking.

Risotto cooking:

  1. Start by toasting the rice in a dry pan for a couple of minutes over high heat. After two minutes, deglaze with white wine and wait until the alcohol has evaporated.
  2. Add the broth gradually to cover the rice and stir. Also, add the rehydrated dried porcini mushrooms and the stems of the fresh porcini mushrooms. They will cook together with the rice, imparting their flavors. Remember to always stir and add more broth as it evaporates.
  3. Continue cooking for 10-15 minutes, tasting occasionally, until you achieve the desired consistency.
  4. When the heat is off, stir in butter and Parmesan cheese.
  5. Plate the risotto and add the sautéed mushroom caps from earlier, some parsley, and a sprinkle of pepper.”
  6. Enjoy your risotto with mushrooms!