From quickly written python script to shared package¶

Techdays, Strasbourg, March 25, 2025

Matthieu Boileau

Sources | HTML slideshow | PDF slideshow

Common situation in a mathematics laboratory¶

To solve a problem, a researcher has written a Python script, usually in the form of a Jupyter notebook. He wishes to:

  • make it a publication,
  • develop it as a team,
  • make it available to collaborators.

Example: a notebook that calculates the propagation of a 1D linear wave¶

A right-travelling sinus wave

notebook/linewave.ipynb

This script contains lots of brilliant ideas, but...

  • it is monolithic: no modularity
  • it mixes code and data:
    • if you want to change the input data, you have to change the code thus multiply the versions
    • the produced data ends up in the code sources
  • it is only validated by its author: no review, no tests, no documentation

This presentation proposes to follow a path that leads from the isolated script to a tested, documented, installable, and published Python package.

The Stairway of Competence¶

This path can be seen as a stairway where each step corresponds to a skill to acquire:

Path to salvation

First step: splitting the script into functions and CLI¶

In [1]:
from IPython.display import Code
Code(filename='linewave/linewave.py', language="python")
Out[1]:
"""Solve the 1D wave equation using the leap-frog scheme"""

import argparse
import numpy as np
import matplotlib.pyplot as plt

c: float = 1.0  # Wave speed


def sinus(x: np.ndarray, t: float) -> np.ndarray:
    """
    Compute the analytical solution of the 1D wave equation
    
    Args:
        x: Grid points
        t: Time

    Returns:
        u: Solution of the wave equation
    """
    return np.sin(2 * np.pi * (x - c * t))


def compute_wave(
    L: float, T: float, CFL: float, N: int
) -> tuple[float, np.ndarray, np.ndarray]:
    """
    Compute the solution of the 1D wave equation using the leap-frog scheme

    Args:
        L: Length of the domain
        T: Final time
        CFL: CFL number
        N: Number of grid points

    Returns:
        t: Final time
        x: Grid points
        u: Solution of the wave equation
    """

    # Discretization (we remove the endpoint because of periodic boundary conditions)
    x, dx = np.linspace(0, L, N, endpoint=False, retstep=True)
    dt = CFL * dx / c  # Time step

    # Set initial solution
    un = sinus(x, 0.0)
    unm1 = sinus(x, -dt)

    # Leap-frog scheme
    t: float = 0.0
    while t < T:
        t += dt
        unp1 = (
            -unm1
            + 2 * un
            + CFL**2 * (np.roll(un, 1) - 2 * un + np.roll(un, -1))
        )
        # Exchange array references for avoiding a copy
        unm1, un, unp1 = un, unp1, unm1

    return t, x, un


def L2_error(t: float, x: np.ndarray, u: np.ndarray) -> float:
    """
    Compute the L2 error norm

    Args:
        t: Final time
        x: Grid points
        u: Solution of the wave equation

    Returns:
        L2 error norm
    """
    return np.linalg.norm(u - sinus(x, t)) / np.linalg.norm(sinus(x, t))


def plot(t: float, x: np.ndarray, u: np.ndarray):
    """
    Plot the solution using matplotlib
    
    Args:
        t: Final time
        x: Grid points
        u: Solution of the wave equation
    """
    plt.plot(x, u, "o", label=f"t = {t:.2f}")
    plt.plot(x, sinus(x, t), label="Analytical")
    plt.title(f"Leap-frog solution with N = {len(x)}")
    plt.xlabel("x")
    plt.ylabel("u")
    plt.legend()
    plt.show()


def main():
    """Main function with CLI"""
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument(
        "--L", type=float, default=1.0, help="Length of the domain"
    )
    parser.add_argument("--T", type=float, default=100.0, help="Final time")
    parser.add_argument("--CFL", type=float, default=0.99, help="CFL number")
    parser.add_argument(
        "--N", type=int, default=40, help="Number of grid points"
    )
    args = parser.parse_args()
    t, x, u = compute_wave(**vars(args))
    plot(t, x, u)
    print(f"L2 error norm: {L2_error(t, x, u):.3e}")


if __name__ == "__main__":
    main()

We can now run the script from the command line. We display the help:

In [2]:
%run linewave/linewave.py --help
usage: linewave.py [-h] [--L L] [--T T] [--CFL CFL] [--N N]

Solve the 1D wave equation using the leap-frog scheme

options:
  -h, --help  show this help message and exit
  --L L       Length of the domain (default: 1.0)
  --T T       Final time (default: 100.0)
  --CFL CFL   CFL number (default: 0.99)
  --N N       Number of grid points (default: 40)

And run it with its default values:

In [3]:
%run linewave/linewave.py
No description has been provided for this image
L2 error norm: 1.289e-02

Or with other parameters:

In [4]:
%run linewave/linewave.py --T 1000 --N 20
No description has been provided for this image
L2 error norm: 5.134e-01

In this 1D configuration, the method is exact for CFL = 1:

In [5]:
%run linewave/linewave.py --T 1000 --N 20 --CFL 1.
No description has been provided for this image
L2 error norm: 2.273e-09

Unit tests with pytest¶

Pytest logo

We test several functions of the linewave module by writing a file named test_linewave.py:

In [6]:
Code(filename='test_linewave.py')
Out[6]:
import numpy as np
from pytest import approx

from linewave.linewave import sinus, compute_wave, L2_error


def test_linewave():
    t, x, u = compute_wave(T=50, N=80, CFL=0.99, L=2.0)
    assert x.max() == 2.0 - 2.0 / 80
    assert t >= 50
    assert L2_error(t, x, u) < 0.01


def test_analytical_solution():
    x = np.linspace(0.0, 1.5, 50)
    assert sinus(x, t=3) == approx(np.sin(2 * np.pi * x), abs=1e-14)


def test_L2_error():
    x = np.linspace(0.0, 1.5, 50)
    u = np.sin(2 * np.pi * x)
    # arrays are the same
    assert L2_error(t=0, x=x, u=u) == approx(0.0, abs=1e-16)
    # arrays are shifted by a phase
    assert L2_error(t=3, x=x, u=u) == approx(0.0, abs=1e-14)

Now let's have it tested.

In [7]:
!pytest -v
============================= test session starts ==============================
platform darwin -- Python 3.11.11, pytest-8.3.5, pluggy-1.5.0 -- /Users/boileau/Documents/Conf/2025/Techdays2025/script2package/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/boileau/Documents/Conf/2025/Techdays2025/script2package
plugins: anyio-4.9.0
collected 3 items                                                              

test_linewave.py::test_linewave PASSED                                   [ 33%]
test_linewave.py::test_analytical_solution PASSED                        [ 66%]
test_linewave.py::test_L2_error PASSED                                   [100%]

============================== 3 passed in 0.17s ===============================

Some guides to start with pytest:

  • pyopensci guide: why you should write tests,
  • pytest guide: some good practices.

GitLab & Co¶

At this stage, we take a HUGE shortcut: we use cookiecutter to create a GitLab project from a Python package skeleton:

cookiecutter https://gitlab.math.unistra.fr/boileau/cookiecutter.git --directory python/irma

This command:

  • creates a Python project containing:
    • the python sources in src/linewave,
    • the tests in tests/,
    • the Sphinx documentation in docs/,
    • a LICENSE file,
    • a README.md file,
    • a CHANGELOG.md file,
    • the pyproject.toml file which describes the Python project,
    • the .gitlab-ci.yml file which describes the continuous integration pipeline
  • publishes this project on GitLab at the address https://gitlab.math.unistra.fr/boileau/linewave

Project structure¶

linewave
├── pyproject.toml
├── LICENSE
├── README.md
├── CHANGELOG.md
├── docs
│   ├── make.bat
│   ├── Makefile
│   └── source
│       ├── conf.py
│       ├── installation.md
│       └── index.md
├── src
│   └── linewave
│       ├── linewave.py
│       └── __init__.py
└── tests
    └── test_linewave.py

Benefits of putting the linewave package in a src directory:

  • the package is isolated from the tests and the documentation
  • other python files won't be considered as part of the package
  • it helps prevent accidental import of test modules
  • it encourages a clearer separation of concerns within the project structure

See Setup tools documentation for more details.

README file¶

The README.md file is the entry point of the project. It should contain at least:

  • a description of the project,
  • an invitation to test the project,
  • a link to the documentation.

A common practice is to include badges in the README.md file. For example:

  • the build status badge,
  • the coverage badge,
  • the license badge,
  • the version badge.

LICENSE file¶

The LICENSE file contains the license under which the project is distributed. A license is mandatory if you want to share your project.

If a project does not have a license, it means that it is copyrighted!

Find your license on choosealicense.com.

CHANGELOG file¶

The CHANGELOG.md file is a log of changes made to the project. It should contain:

  • the version number,
  • the date of the change,
  • a description of the change.

keepachangelog.com is a simple guide that promotes a consistent format for changelogs. In particular, it recommends using Semantic Versioning where version numbers are in the form MAJOR.MINOR.PATCH and are incremented as follows:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards compatible manner, and
  • PATCH version when you make backwards compatible bug fixes.

Anatomy of pyproject.toml¶

In [8]:
Code(filename='../linewave/pyproject.toml') 
Out[8]:
[build-system]
requires = ["setuptools>=61.2"]
build-backend = "setuptools.build_meta"

[project]
name = "linewave"
authors = [
  { name = "Matthieu Boileau", email = "matthieu.boileau@math.unistra.fr" },
]
description = "A 1D linear wave solver"
classifiers = [
  "Programming Language :: Python :: 3",
  "License :: OSI Approved :: MIT License",
  "Operating System :: OS Independent",
]
requires-python = ">=3.8"
dependencies = []  # Add your python dependencies here
dynamic = ["version"]  # the version is defined in the [tool.setuptools.dynamic.version] section

[project.optional-dependencies]
test = ["pytest", "pytest-cov"]
doc = [
  "Sphinx >= 7.2.2",  # 7.2.2 is the first version that supports Python 3.9
  "myst-parser",  # Markdown support for Sphinx
  "furo",  # A modern theme for Sphinx
  "sphinx-copybutton",  # Add copy buttons to code blocks
  "sphinx-autobuild",  # Auto-rebuild Sphinx documentation when editing
]

[project.license]
text = "MIT"

[project.readme]
file = "README.md"
content-type = "text/markdown"

[project.urls]
Homepage = "https://gitlab.math.unistra.fr/boileau/linewave"  
Documentation = "https://boileau.pages.math.unistra.fr/linewave"  # Hosted on GitHub or GitLab using Pages
Repository = "https://gitlab.math.unistra.fr/boileau/linewave.git"
Issues = "https://gitlab.math.unistra.fr/boileau/linewave/-/issues"
Changelog = "https://gitlab.math.unistra.fr/boileau/linewave/-/blob/main/CHANGELOG.md"

[project.scripts]
# An entry for the command line interface
"linewave" = "linewave.linewave:main"

[tool.setuptools]
include-package-data = true  # Include non-python files in the package
license-files = ["LICENSE"]  # Include the license file in the package

[tool.setuptools.package-dir]
"" = "src"  # The package is in the src directory

[tool.setuptools.packages.find]
where = ["src"]  # Look for packages in the src directory
namespaces = false  # Do not use namespace packages

[tool.setuptools.dynamic.version]  # The version is defined in the __init__.py file
attr = "linewave.__version__"

A build-system section that changes according to the package manager used:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Here, we use setuptools as a build backend but we could use flit, poetry, hatchling, etc.

A project section that describes the project:

[project]
name = "linewave"
authors = [
  { name = "Matthieu Boileau", email = "matthieu.boileau@math.unistra.fr" },
]
description = "A 1D linear wave solver"
classifiers = [
  "Programming Language :: Python :: 3",
  "License :: OSI Approved :: MIT License",
  "Operating System :: OS Independent",
]
requires-python = ">=3.8"
dependencies = ["numpy", "matplotlib"]  # Add your python dependencies here
dynamic = ["version"]  # the version is defined in the __init__.py file

A project.optional-dependencies section that lists the optional dependencies:

[project.optional-dependencies]
test = ["pytest", "pytest-cov"]
doc = [
  "Sphinx >= 7.2.2",  # 7.2.2 is the first version that supports Python 3.9
  "myst-parser",  # Markdown support for Sphinx
  "furo",  # A modern theme for Sphinx
  "sphinx-copybutton",  # Add copy buttons to code blocks
  "sphinx-autobuild",  # Auto-rebuild Sphinx documentation when editing
]

So the optional dependencies can be installed with:

pip install -e ".[doc,test]"

Information sections that describe the project:

[project.license]
text = "MIT"

[project.readme]
file = "README.md"
content-type = "text/markdown"

[project.urls]
Homepage = "https://gitlab.math.unistra.fr/boileau/linewave"  
Documentation = "https://boileau.pages.math.unistra.fr/linewave"
Repository = "https://gitlab.math.unistra.fr/boileau/linewave.git"
Issues = "https://gitlab.math.unistra.fr/boileau/linewave/-/issues"
Changelog = "https://gitlab.math.unistra.fr/boileau/linewave/-/blob/main/CHANGELOG.md"

A scripts section that lists the scripts to be installed:

[project.scripts]
# An entry for the command line interface
"linewave" = "linewave.linewave:main"

So the command line interface can be run from anywhere with:

linewave

This command will run the main() function in the linewave module of the linewave package.

Some setuptools options:

[tool.setuptools.package-dir]
"" = "src"  # The package is in the src directory

[tool.setuptools.packages.find]
where = ["src"]  # Look for packages in the src directory
namespaces = false  # Do not use namespace packages

[tool.setuptools.dynamic.version]  # The version is defined in the __init__.py file
attr = "linewave.__version__"

Installing the package¶

Thanks to the pyproject.toml file, the project can be installed with:

pip install .

Doing so, the project is installed in the current environment and can be imported in any Python module from anywhere:

import linewave

or run from the command line from anywhere:

linewave

This plays an important role in separating the data from the code.

Editable mode¶

When developing the project, it is recommended to install it in editable mode:

pip install -e .

So that the changes made to the project sources are visible without having to reinstall it.

Creating a documentation¶

Sphinx logo

Python native documentation tool is Sphinx.

Remember that you are writting for two kind of readers:

  • the users of your package,
  • the developers of your package.

A Sphinx documentation is composed essentially of:

  • a source directory containing the sources of the documentation,
  • a conf.py file that configures the documentation.
  • a index.rst/index.md file that is the entry point of the documentation.
  • .rst/.md files that contain the documentation.

Auto-generating the documentation¶

By using the autodoc extension, you can automatically generate the documentation of the modules, classes and functions of your package.

The main benefit is that the documentation source is very close to the code: if you write (carefully) the docstrings and keep them updated, your documentation will always reflect the latest changes in your code. Otherwise...

Building the documentation¶

To build the documentation locally, you may use:

sphinx-autobuild docs/source docs/build

This command will:

  • build the documentation in the docs/build directory,
  • start a web server that serves the documentation at http://localhost:8000,
  • rebuild the documentation each time a file in the docs/source directory is modified.

Publishing the documentation¶

Your documentation can be published online using GitLab Pages (see below) or Read the Docs.

Starting with Sphinx¶

Sphinx is a vast tool with many features and possible extensions. You may start with this nice guide from pyopensci. And for more information, see the Sphinx documentation.

An overview of GitLab-CI¶

CI stands for Continuous Integration. It is a practice that consists of verifying each code integration by an automated build (including tests) to detect errors as quickly as possible.

Principles of the GitLab CI/CD pipeline:

  • it is defined in a .gitlab-ci.yml file at the root of the project,
  • it is triggered by a push on GitLab,
  • it is composed of stages (build, test, deploy, etc.),
  • each stage is composed of jobs,
  • each job must meet a GitLab runner to be executed.

A GitLab runner is a service that can be installed on any machine (even your laptop) to execute the jobs of the pipeline. There are two types of runners:

  • shared runners: managed by the GitLab administrator,
  • specific runners: managed by the project administrator.

The Docker GitLab runner is a type of runner that uses Docker containers to run jobs in isolation. This allows for a consistent and reproducible environment for all jobs in the pipeline.

A .gitlab-ci.yml file¶

In [9]:
Code(filename="../linewave/.gitlab-ci.yml")
Out[9]:
default:
  image: "python:3.12"  # use the specified docker image
  tags:
    - docker

stages:
  - test
  - build
  - deploy
  - release

before_script:
  ## Prepare python virtual environment
  - python -m venv .venv/
  - source .venv/bin/activate
  - pip install -U pip

test:
  stage: test
  script:
    - pip install -e .[test]  # install test dependencies
    - pytest --durations=0 --cov=src/linewave -sv  # run pytest with code coverage
    - coverage html -d public/coverage  # generate coverage html report
  coverage: '/(?i)total.*? (100(?:\.0+)?\%|[1-9]?\d(?:\.\d+)?\%)$/'  # extract the coverage rate for the badge
  artifacts:
    paths:
      - public  # export the coverage html report

doc:
  stage: test  # so the documentation is built in parallel with the tests
  script:
    - pip install -e .[doc]  # install doc dependencies
    - sphinx-build -b html docs/source/ public/  # build the documentation
  artifacts:
    paths:
      - public  # export the documentation

pages:
  stage: deploy
  before_script: []  # no need to prepare python virtual environment
  script:
    - echo 'Nothing to do...'  # public dir is already prepared by the previous jobs
  only:
    - main  # deploy the documentation only for the main branch
  artifacts:
    paths:
      - public  # export the coverage and documentation


release_to_gitlab:  # Create a release on GitLab from the git tag
  stage: release
  before_script: []  # no need to prepare python virtual environment
  image: registry.gitlab.com/gitlab-org/release-cli:latest
  rules:
    - if: $CI_COMMIT_TAG # Run this job when a tag is created
  script:
    - echo "running release_job"
  release: # See https://docs.gitlab.com/ee/ci/yaml/#release for available properties
    tag_name: "$CI_COMMIT_TAG"
    description: "$CI_COMMIT_TAG_MESSAGE"

Editing the project skeleton¶

We copy our code into the skeleton:

In [10]:
!cp linewave/linewave.py ../linewave/src/linewave/
!cp test_linewave.py ../linewave/tests/
  • we install linewave locally in editable mode
  • we verify that the tests pass
  • we push it to GitLab.

Creating a release¶

The Git workflow can be illustrated as follows:

Git workflow diagram

Steps to create a release¶

  1. In the develop branch:

    1. update the CHANGELOG.md file with the changes made since the last release.
    2. increase the version number in the __init__.py file.
    3. commit and push the changes.
    4. create a merge request to the main branch.
    5. merge the merge request if the pipeline is successful.
  2. In the main branch on GitLab, create a tag with the version number and the CHANGELOG entry as a message. It triggers the release_to_gitlab CI job.

  3. Once the gitlab-ci pipeline is finished, the release is created on GitLab and the release badge is updated.

Conclusion¶

Once you have separated the code from the data, your code can use:

  • a versionning system: Git and GitLab
  • a testing framework: pytest using test/ directory
  • a documentation framework: README, LICENSE, CHANGELOG, docstring + Sphinx
  • a packaging tool: setuptools using pyproject.toml
  • a CI pipeline: GitLab-CI using .gitlab-ci.yml

Your python project is now a package that can be installed with:

# Private gitlab project
pip install git+ssh://git@gitlab.math.unistra.fr/boileau/linewave.git@v0.1.0

# Public gitlab project
pip install git+https://gitlab.math.unistra.fr/boileau/linewave.git@v0.1.0

# Official PyPI repository if a PyPI job is defined in the pipeline (see Extra)
pip install linewave

A significant amount of boilerplate code is added but can be easily automated with a templating tool such as cookiecutter.

However this apparent ease can be deceptive as it requires mastering:

  • version tracking with git,
  • the environment of a GitLab forge,
  • the principles of the GitLab-CI workflow,
  • the basics of packaging, testing, and documentation in python.

Next steps to climb¶

  • Add contribution guidelines and code of conduct: require a consensus among the early contributors. See this little guide.
  • Enforce a code style: See Black and the pre-commit tool.
  • Reference your code on Software Heritage: easy, no excuse!
    • Only a codemeta.json file in the project root is needed
    • Software Heritage will automatically archive the code and metadata
    • It will produce a citation and a unique identifier for the code and its versions
  • Make your code reproducible: pretty hard but should be considered! See Guix.

References¶

  • https://learn.scientific-python.org/development/
  • https://www.pyopensci.org/python-packaging-science.html
  • https://packaging.python.org

Extra: deploying the package on PyPI¶

Another job in the pipeline can deploy the package to the Python Package Index (PyPI) or to a private repository.

deploy_to_pypi:  # Deploy the package to PyPI
  stage: release
  before_script: []  # no need to prepare python virtual environment
  image: python:3.12
  rules:
    - if: $CI_COMMIT_TAG # Run this job when a tag is created
  script:
    - pip install --upgrade pip
    - pip install build twine
    - python -m build
    - TWINE_PASSWORD=${PYPI_TOKEN} TWINE_USERNAME=__token__ python -m twine upload dist/*

This requires to:

  • create a project on pypi.org,
  • create a PyPI token for this project on pypi.org,
  • configure the GitLab CI/CD settings to include the PYPI_TOKEN as an environment variable.