Creating a distribution package

Distribution Packages are archives that can be uploaded to a package index such as pypi.org and installed with pip.

Structure

A minimal distribution package can look like this, for example:

dataprep
├── pyproject.toml
└── src
    └── dataprep
        ├── __init__.py
        └── loaders.py

pyproject.toml

PEP 517 and PEP 518 brought extensible build backends, isolated builds and pyproject.toml in TOML format.

Among other things, pyproject.toml tells pip and build which backend tool to use to build distribution packages for your project. You can choose from a number of backends, though this tutorial uses hatchling by default.

A minimal yet functional dataprep/pyproject.toml file will then look like this, for example:

1[build-system]
2requires = ["hatchling"]
3build-backend = "hatchling.build"
build-system

defines a section describing the build system

requires

defines a list of dependencies that must be installed for the build system to work, in our case hatchling.

Note

Dependency version numbers should usually be written in the requirements.txt file, not here.

build-backend

identifies the entry point for the build-backend object as a dotted path. The hatchling backend object is available under hatchling.build.

Note

However, for Python packages that contain binary extensions with Cython, C, C++, Fortran or Rust, the hatchling backend is not suitable. One of the following backends should be used here:

But thatr’s not all – there are other backends:

Note

With validate-pyproject you can check your pyproject.toml file.

See also

If you want to look at alternatives to hatchling:

Metadata

In pyproject.toml you can also specify metadata for your package, such as:

 5[project]
 6name = "dataprep"
 7version = "0.1.0"
 8authors = [
 9  { name="Veit Schiele", email="veit@cusy.io" },
10]
11description = "A small dataprep package"
12readme = "README.rst"
13requires-python = ">=3.7"
14classifiers = [
15    "Programming Language :: Python :: 3",
16    "License :: OSI Approved :: BSD License",
17    "Operating System :: OS Independent",
18]
19dependencies = [
20    "pandas",
21]
22
23[project.urls]
24"Homepage" = "https://github.com/veit/dataprep"
25"Bug Tracker" = "https://github.com/veit/dataprep/issues"
name

is the distribution name of your package. This can be any name as long as it contains only letters, numbers, ., _ and -. It should also not already be assigned on the Python Package Index (PyPI).

version

is the version of the package.

In our example, the version number has been set statically. However, there is also the possibility to specify the version dynamically, for example by a file:

[project]
...
dynamic = ["version"]
[tool.hatch.version]
path = "src/dataprep/__about__.py"

The default pattern looks for a variable called __version__ or VERSION, which contains the version, optionally preceded by the lower case letter v. The default pattern is based on PEP 440.

If this is not the way you want to store versions, you can define a different regular expression with the pattern option.

However, there are other version scheme plug-ins, such as hatch-semver for semantic Versioning.

With the version source plugin hatch-vcs you can also use Git tags:

[build-system]
requires = ["hatchling", "hatch-vcs"]
...
[tool.hatch.version]
source = "vcs"
raw-options = { local_scheme = "no-local-version" }

The setuptools backend also allows dynamic versioning:

[build-system]
requires = ["setuptools>=61.0", "setuptools-scm"]
build-backend = "setuptools.build_meta"
[project]
...
dynamic = ["version"]
[tool.setuptools.dynamic]
version = {attr = "dataprep.VERSION"}

Tip

If the version is in several text files, the use of Bump My Version may be recommended.

The configuration file .bumpversion.toml can look like this, for example:

[tool.bumpversion]
current_version = "0.1.0"
parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)"
serialize = ["{major}.{minor}.{patch}"]
search = "{current_version}"
replace = "{new_version}"
regex = false
ignore_missing_version = false
tag = false
sign_tags = false
tag_name = "v{new_version}"
tag_message = "Bump version: {current_version} → {new_version}"
allow_dirty = false
commit = false
message = "Bump version: {current_version} → {new_version}"
commit_args = ""

[[tool.bumpversion.files]]
filename = "src/dataprep/__init__.py"

[[tool.bumpversion.files]]
filename = "docs/conf.py"
authors

is used to identify the authors of the package by name and email address.

You can also list maintainers in the same format.

description

is a short summary of the package, consisting of one sentence.

readme

is a path to a file containing a detailed description of the package. This is displayed on the package details page on Python Package Index (PyPI). In this case, the description is loaded from README.rst.

requires-python

specifies the versions of Python that are supported by your project. This will cause installers like pip to search through older versions of packages until they find one that has a matching Python version.

classifiers

gives the Python Package Index (PyPI) and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is under the BSD licence and is OS independent. You should always at least specify the versions of Python your package runs under, under which licence your package is available and on which operating systems your package runs. You can find a complete list of classifiers at https://pypi.org/classifiers/.

They also have a useful additional feature: to prevent a package from being uploaded to PyPI, use the special classifier "Private :: Do Not Upload". PyPI will always reject packages whose classifier starts with "Private ::".

dependencies

specifies the dependencies for your package in an array.

See also

PEP 631

urls

lets you list any number of additional links that are displayed on the Python Package Index (PyPI). In general, this could lead to source code, documentation, task managers, etc.

Optional dependencies

project.optional-dependencies

allows you to specify optional dependencies for your package. You can also distinguish between different sets:

34[project.optional-dependencies]
35tests = [
36    "coverage[toml]",
37    "pytest>=6.0",
38]
39docs = [
40    "furo",
41    "sphinxext-opengraph",
42    "sphinx-copybutton",
43    "sphinx_inline_tabs"
44]

Recursive optional dependencies are also possible with pip ≥ 21.2. For example, for dev you can take over all dependencies from docs and test in addition to pre-commit:

35dev = [
36    "dataprep[tests, docs]",
37    "pre-commit"
38]

You can install these optional dependencies, for example with:

$ cd /PATH/TO/YOUR/DISTRIBUTION_PACKAGE
$ python3 -m venv .venv
$ . .venv/bin/activate
$ python -m pip install --upgrade pip
$ python -m pip install -e ".[dev]"
> cd C:\PATH\TO\YOUR\DISTRIBUTION_PACKAGE
> python3 -m venv .venv
> .venv\Scripts\activate.bat
> python -m pip install --upgrade pip
> python -m pip install -e ".[dev]"

src package

When you create a new package, you shouldn’t use a flat layout but the src layout, which is also recommended in Packaging Python Projects of the PyPA. A major advantage of this layout is that tests are run with the installed version of your package and not with the files in your working directory.

See also

Note

In Python ≥ 3.11 PYTHONSAFEPATH can be used to ensure that the installed packages are used first.

dataprep

is the directory that contains the Python files. The name should match the project name to simplify configuration and be more recognisable to those installing the package.

__init__.py

is required to import the directory as a package. This allows you to import the following:

import dataprep.loaders

or

from dataprep import loaders

Although __init__.py files are often empty, they can also contain code.

See also

loaders.py

is an example of a module within the package that could contain the logic (functions, classes, constants, etc.) of your package.

Other files

CONTRIBUTORS.rst

See also

LICENSE

You can find detailed information on this in the Licensing section.

README.rst

This file briefly tells those who are interested in the package how to use it.

If you write the document in reStructuredText, you can also include the contents as a detailed description in your package:

5[project]
6readme = "README.rst"

You can also include them in your Sphinx documentation with .. include:: ../../README.rst.

CHANGELOG.rst

Historical files or files needed for binary extensions

Before the pyproject.toml file introduced with PEP 518 became the standard, setuptools required setup.py, setup.cfg and MANIFEST.in. Today, however, these files are only needed for binary extensions at best.

If you want to replace these files in your packages, you can do so with hatch new --init or ini2toml.

setup.py

A minimal and yet functional dataprep/setup.py can look like this, for example:

1from Cython.Build import cythonize
2from setuptools import find_packages, setup
3
4setup(
5    ext_modules=cythonize("src/dataprep/cymean.pyx"),
6)

package_dir points to the src directory, which can contain one or more packages. You can then use setuptools’s find_packages() to find all packages in this directory.

Note

find_packages() without src/ directory would package all directories with a __init__.py file, so also tests/ directories.

setup.cfg

This file is no longer needed, at least not for packaging. wheel nowadays collects all required licence files automatically and setuptools can build universal wheel packages with the options keyword argument, for example dataprep-0.1.0-py3-none-any.whl.

MANIFEST.in

The file contains all files and directories that are not already covered by packages or py_module. It can look like this: dataprep/MANIFEST.in:

1include LICENSE *.rst *.toml *.yml *.yaml *.ini
2graft src
3recursive-exclude __pycache__ *.py[cod]

For more instructions in Manifest.in, see MANIFEST.in commands.

Note

People often forget to update the Manifest.in file. To avoid this, you can use check-manifest in a pre-commit hook.

Note

If you want files and directories from MANIFEST.in to be installed as well, for example if they are runtime-relevant data, you can specify this with include_package_data=True in your setup() call.

Create package structure

With uv init --package MYPACK you can easily create an initial file structure for packages.

$ uv init --package mypack
$  tree mypack -a
mypack
├── .git
│   └── ...
├── .gitignore
├── .python-version
├── README.md
├── pyproject.toml
└── src
    └── mypack
        └── __init__.py
mypack/pyproject.toml

The file pyproject.toml contains a scripts entry point mypack:main:

mypack/pyproject.toml
[project]
name = "mypack"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
authors = [
    { name = "Veit Schiele", email = "veit@cusy.io" }
]
requires-python = ">=3.13"
dependencies = []

[project.scripts]
mypack = "mypack:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
mypack/src/mypack/__init__.py

The module defines a CLI function main():

mypack/src/mypack/__init__.py
def main() -> None:
    print("Hello from mypack!")

It can be called with uv run:

$ uv run mypack
Hello from mypack!

Note

If necessary, uv run creates a virtual Python environment in the .venv folder before main() is executed.

Build

The next step is to create distribution packages for the package. These are archives that can be uploaded to the Python Package Index (PyPI) and installed by pip. Now execute the command in the same directory where pyproject.toml is located:

$ uv build
Building source distribution...
Building wheel from source distribution...
  Successfully built dist/mypack-0.1.0.tar.gz and dist/mypack-0.1.0-py3-none-any.whl
> uv build
Building source distribution...
Building wheel from source distribution...
  Successfully built dist/mypack-0.1.0.tar.gz and dist/mypack-0.1.0-py3-none-any.whl
dist/mypack-0.1.0-py3-none-any.whl

is a build distribution. pip prefers to install build distributions and only uses the source distributions if no suitable build distribution is available. You should always upload a source distribution and provide build distributions for the platforms with which your project is compatible. In this case, our example package is compatible with Python on every platform, so only one build distribution is required:

mypack

is the normalised package name

0.1.0

is the version of the distribution package

py3

specifies the Python version and, if applicable, the C-ABI

none

specifies whether the Wheel package is suitable for any OS or only specific ones

any

any is suitable for any processor architecture, x86_64 on the other hand only for chips with the x86 instruction set and a 64-bit architecture

mypack-0.1.0.tar.gz

is a source distribution.

See also

For more information on sdist, see Core metadata specifications and PyPA specifications.

Testing

$ mkdir test_env
$ cd test_env
$ python3 -m venv .venv
$ . .venv/bin/activate
$ python -m pip install dist/dataprep-0.1.0-cp313-cp313-macosx_13_0_arm64.whl
Processing ./dist/dataprep-0.1.0-cp313-cp313-macosx_13_0_arm64.whl
Collecting Cython (from dataprep==0.1.0)
  Using cached Cython-3.0.11-py2.py3-none-any.whl.metadata (3.2 kB)

Successfully installed Cython-3.0.11 dataprep-0.1.0 numpy-2.1.2 pandas-2.2.3 python-dateutil-2.9.0.post0 pytz-2024.2 six-1.16.0 tzdata-2024.2
> mkdir test_env
> cd test_env
> python -m venv .venv
> .venv\Scripts\activate.bat
> python -m pip install dist/dataprep-0.1.0-cp313-cp313-win_amd64.whl
Processing ./dist/dataprep-0.1.0-cp313-cp313-win_amd64.whl
Collecting Cython (from dataprep==0.1.0)
  Using cached Cython-3.0.11-cp313-cp313-win_amd64.whl.metadata (3.2 kB)

Successfully installed Cython-3.0.11 dataprep-0.1.0 numpy-2.1.2 pandas-2.2.3 python-dateutil-2.9.0.post0 pytz-2024.2 six-1.16.0 tzdata-2024.2

You can then check the Wheel file with:

$ mkdir test_env
$ cd !$
cd test_env
$ python3 -m venv .venv
$ . .venv/bin/activate
$ python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1.0-py3-none-any.whl
Collecting pandas
  Using cached pandas-1.3.4-cp39-cp39-macosx_10_9_x86_64.whl (11.6 MB)

Successfully installed dataprep-0.1.0 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0

Then you can check the wheel with:

$ python -m pip install check-wheel-contents
$ check-wheel-contents dist/*.whl
dist/dataprep-0.1.0-py3-none-any.whl: OK

Alternatively, you can also install the package in a new project, for example in myapp:

$ python -m pip install dist/dataprep-0.1.0-py3-none-any.whl
Processing ./dist/dataprep-0.1-py3-none-any.whl
Collecting pandas

Installing collected packages: numpy, pytz, six, python-dateutil, pandas, dataprep
Successfully installed dataprep-0.1 numpy-1.21.4 pandas-1.3.4 python-dateutil-2.8.2 pytz-2021.3 six-1.16.0

You can then call Python and import your loaders module:

from dataprep import loaders

Note

There are still many instructions that include a step to call setup.py, for example python setup.py sdist. However, this is now considered anti-pattern by parts of the Python Packaging Authority (PyPA).

Checks

  • If you want to create a task management package that writes the tasks to a database and provides them via a Python API and a command line interface (CLI), how would you structure the files?

  • Think about how you want to fulfil the above tasks. Which libraries and modules can you think of that could fulfil this task? Sketch the code for the modules of the Python API, the command line interface and the database connection.