Creating a distribution package¶
Distribution Packages are archives that can be uploaded to a package index such as pypi.org and installed with pip.
Structure¶
A minimal distribution package can look like this, for example:
dataprep
├── pyproject.toml
└── src
└── dataprep
├── __init__.py
└── loaders.py
pyproject.toml¶
PEP 517 and PEP 518 brought extensible build backends, isolated builds and pyproject.toml in TOML format.
Among other things, pyproject.toml tells pip and build
which backend tool to use to build distribution packages for your project. You
can choose from a number of backends, though this tutorial uses hatchling by
default.
A minimal yet functional dataprep/pyproject.toml file will then look
like this, for example:
1[build-system]
2requires = ["hatchling>=1.27"]
3build-backend = "hatchling.build"
build-systemdefines a section describing the build system
requiresdefines a list of dependencies that must be installed for the build system to work, in our case
hatchling>=1.27for PEP 639 support.Note
Dependency version numbers should usually be written in the constraints.txt file, not here.
build-backendidentifies the entry point for the build-backend object as a dotted path. The
hatchlingbackend object is available underhatchling.build.Note
However, for Python packages that contain binary extensions with
Cython,C,C++,FortranorRust, the hatchling backend is not suitable. One of the following backends should be used here:But that’s not all – there are other backends:
See also
Note
With check-toml,
pyproject-fmt and validate-pyproject you can format and
check your pyproject.toml file.
See also
If you want to look at alternatives to hatchling:
Metadata¶
In pyproject.toml you can also specify metadata for your package, such
as:
5[project]
6name = "dataprep"
7version = "0.1.0"
8description = "A small dataprep package"
9readme = "README.rst"
10authors = [
11 { name = "Veit Schiele", email = "veit@cusy.io" },
12]
13requires-python = ">=3.9"
14classifiers = [
15 "License :: OSI Approved :: BSD License",
16 "Operating System :: OS Independent",
17 "Programming Language :: Python :: 3 :: Only",
18 "Programming Language :: Python :: 3.9",
19 "Programming Language :: Python :: 3.10",
20 "Programming Language :: Python :: 3.11",
21 "Programming Language :: Python :: 3.12",
22 "Programming Language :: Python :: 3.14",
23]
24dependencies = [
25 "furo",
26 "sphinx-copybutton",
27 "sphinx-inline-tabs",
28 "sphinxext-opengraph",
29]
30tests = [
31 "coverage[toml]",
32 "pytest>=6",
33]
nameis the distribution name of your package. This can be any name as long as it contains only letters, numbers,
.,_and-. It should also not already be assigned on the Python Package Index (PyPI).versionis the version of the package.
In our example, the version number has been set statically. However, there is also the possibility to specify the version dynamically, for example by a file:
[project] ... dynamic = ["version"] [tool.hatch.version] path = "src/dataprep/__about__.py"
The default pattern looks for a variable called
__version__orVERSION, which contains the version, optionally preceded by the lower case letterv. The default pattern is based on PEP 440.If this is not the way you want to store versions, you can define a different regular expression with the
patternoption.See also
However, there are other version scheme plug-ins, such as hatch-semver for semantic Versioning.
With the version source plugin hatch-vcs you can also use Git tags:
[build-system] requires = ["hatchling>=1.27", "hatch-vcs"] ... [tool.hatch.version] source = "vcs" raw-options = { local_scheme = "no-local-version" }
The setuptools backend also allows dynamic versioning:
[build-system] requires = ["setuptools>=77.0", "setuptools-scm"] build-backend = "setuptools.build_meta" [project] ... dynamic = ["version"] [tool.setuptools.dynamic] version = {attr = "dataprep.VERSION"}
If you would like to make this version available in your package, you can use the following code:
src/dataprep/__init__.py¶import importlib.metadata try: __version__ = importlib.metadata.version(__name__) except importlib.metadata.PackageNotFoundError: __version__ = "0.1.dev0" # Fallback for development mode
Tip
If the version is in several text files, the use of Bump My Version may be recommended.
The configuration file
.bumpversion.tomlcan look like this, for example:[tool.bumpversion] current_version = "0.1.0" parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)" serialize = ["{major}.{minor}.{patch}"] search = "{current_version}" replace = "{new_version}" regex = false ignore_missing_version = false tag = false sign_tags = false tag_name = "v{new_version}" tag_message = "Bump version: {current_version} → {new_version}" allow_dirty = false commit = false message = "Bump version: {current_version} → {new_version}" commit_args = "" [[tool.bumpversion.files]] filename = "src/dataprep/__init__.py" [[tool.bumpversion.files]] filename = "docs/conf.py"
authorsis used to identify the authors of the package by name and email address.
You can also list
maintainersin the same format.descriptionis a short summary of the package, consisting of one sentence.
readmeis a path to a file containing a detailed description of the package. This is displayed on the package details page on Python Package Index (PyPI). In this case, the description is loaded from
README.rst.
license-expressioncontains valid SPDX license expressions.
license-filesspecifies a list of files containing licence information.
See also
requires-pythonspecifies the versions of Python that are supported by your project. This will cause installers like pip to search through older versions of packages until they find one that has a matching Python version.
classifiersgives the Python Package Index (PyPI) and pip some additional metadata about your package. In this case, the package is only compatible with Python 3, is under the BSD licence and is OS independent. You should always at least specify the versions of Python your package runs under, under which licence your package is available and on which operating systems your package runs. You can find a complete list of classifiers at https://pypi.org/classifiers/.
They also have a useful additional feature: to prevent a package from being uploaded to PyPI, use the special classifier
"Private :: Do Not Upload". PyPI will always reject packages whose classifier starts with"Private ::".dependenciesspecifies the dependencies for your package in an array.
See also
urlslets you list any number of additional links that are displayed on the Python Package Index (PyPI). In general, this could lead to source code, documentation, task managers, etc.
Error
If you receive the error in uv: error: No `project` table found in:
`/PATH/TO/pyproject.toml`, you probably haven’t defined a [project]
section. This can occur in repositories that only use the
pyproject.toml file to configure tools such as Black and Ruff, but
do not define the project itself.
To resolve the issue, you can insert a [project] section that must
contain at least name and version. Alternatively, you can also use
uv run with the --no-project option.
See also
Dependency groups¶
dependency-groupsallows you to specify dependency groups for your package. You can also distinguish between different sets:
26 "cython",
27 "pandas",
28]
29
30urls."Bug Tracker" = "https://github.com/veit/dataprep/issues"
31urls."Homepage" = "https://github.com/veit/dataprep"
32License-Expression = "BSD-3-Clause"
33License-File = [ "LICENSE" ]
34
35[dependency-groups]
36dev = [
Recursive dependency groups are also possible. For example, for dev you can
take over all dependencies from docs and test in addition to
pre-commit:
37 "pre-commit",
38 { include-group = "docs" },
39 { include-group = "tests" },
40]
41docs = [
You can install these dependency groups, for example with:
$ cd /PATH/TO/YOUR/DISTRIBUTION_PACKAGE
$ python3 -m venv .venv
$ . .venv/bin/activate
$ python -m pip install --upgrade pip
$ python -m pip install --group dev
> cd C:\PATH\TO\YOUR\DISTRIBUTION_PACKAGE
> python3 -m venv .venv
> .venv\Scripts\activate.bat
> python -m pip install --upgrade pip
> python -m pip install --group dev
See also
src package¶
When you create a new package, you shouldn’t use a flat layout but the
src layout, which is also recommended in Packaging Python Projects of the
PyPA. A major advantage of this layout is that tests are run with the
installed version of your package and not with the files in your working
directory.
See also
Hynek Schlawack: Testing & Packaging
Note
In Python ≥ 3.11 PYTHONSAFEPATH can be used to ensure that the
installed packages are used first.
dataprepis the directory that contains the Python files. The name should match the project name to simplify configuration and be more recognisable to those installing the package.
__init__.pyis required to import the directory as a package. This allows you to import the following:
import dataprep.loaders
or
from dataprep import loaders
Although
__init__.pyfiles are often empty, they can also contain code.See also
loaders.pyis an example of a module within the package that could contain the logic (functions, classes, variables, etc.) of your package.
Other files¶
CONTRIBUTORS.rst¶
See also
LICENSE¶
You can find detailed information on this in the Licensing section.
README.rst¶
This file briefly tells those who are interested in the package how to use it.
See also
If you write the document in reStructuredText, you can also include the contents as a detailed description in your package:
5[project]
6]
You can also include them in your Sphinx documentation with .. include:: ../../README.rst.
CHANGELOG.rst¶
Historical files or files needed for binary extensions¶
Before the pyproject.toml file introduced with PEP 518 became the
standard, setuptools required setup.py, setup.cfg and
MANIFEST.in. Today, however, these files are only needed for
binary extensions at best.
If you want to replace these files in your packages, you can do so with hatch
new --init or ini2toml.
setup.py¶
A minimal and yet functional dataprep/setup.py can look like this,
for example:
1from Cython.Build import cythonize
2from setuptools import find_packages, setup
3
4setup(
5 ext_modules=cythonize("src/dataprep/cymean.pyx"),
6)
package_dir
points to the src directory, which can contain one or more packages. You can
then use setuptools’s find_packages()
to find all packages in this directory.
Note
find_packages() without src/ directory would package all directories
with a __init__.py file, so also tests/ directories.
setup.cfg¶
This file is no longer needed, at least not for packaging. wheel nowadays
collects all required licence files automatically and setuptools can build
universal wheel packages with the options keyword argument, for example
dataprep-0.1.0-py3-none-any.whl.
MANIFEST.in¶
The file contains all files and directories that are not already covered by
packages or py_module. It can look like this:
dataprep/MANIFEST.in:
1include LICENSE *.rst *.toml *.yml *.yaml *.ini
2graft src
3recursive-exclude __pycache__ *.py[cod]
For more instructions in Manifest.in, see MANIFEST.in commands.
Note
People often forget to update the Manifest.in file. To avoid this,
you can use check-manifest in a
pre-commit hook.
Note
If you want files and directories from MANIFEST.in to be installed as
well, for example if they are runtime-relevant data, you can specify this
with include_package_data=True in your setup() call.
Create package structure¶
With uv init --package MYPACK you can easily create an initial file
structure for packages.
$ uv init --package mypack
$ tree mypack -a
mypack
├── .git
│ └── ...
├── .gitignore
├── .python-version
├── README.md
├── pyproject.toml
└── src
└── mypack
└── __init__.py
.python-versionspecifies which Python version should be used for developing the project.
Error
If you receive the following error message: error:
error: The Python request from `.python-version` resolved to Python U.V.W, which is incompatible with the project's Python requirement: `>=X.Y`. Use `uv python pin` to update the `.python-version` file to a compatible version., this indicates a conflict between the version specified in the.python-versionfile and therequires-pythonspecification in thepyproject.tomlfile. You now have three different options:Update your
.python-versionfile withuv python pin X.Y.Z.Override the Python version for a single command with
uv run --python X.Y COMMAND.Update
requires-python.
mypack/pyproject.tomlThe file
pyproject.tomlcontains ascriptsentry pointmypack:main:mypack/pyproject.toml¶[build-system] build-backend = "hatchling.build" requires = [ "hatchling" ] [project] name = "mypack" version = "0.1.0" description = "Add your description here" readme = "README.md" authors = [ { name = "Veit Schiele", email = "veit@cusy.io" }, ] requires-python = ">=3.13" classifiers = [ "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.13", "Programming Language :: Python :: 3.14", ] dependencies = [ ] scripts.mypack = "mypack:main"
mypack/src/mypack/__init__.pyThe module defines a CLI function
main():mypack/src/mypack/__init__.py¶def main() -> None: print("Hello from mypack!")
It can be called with
uv run:$ uv run mypack Hello from mypack!
Note
If necessary,
uv runcreates a virtual Python environment in the.venvfolder beforemain()is executed.
Build¶
The next step is to create distribution packages for the package. These are
archives that can be uploaded to the Python Package Index (PyPI)
and installed by pip. Now execute the command in the same directory
where pyproject.toml is located:
$ uv build
Building source distribution...
Building wheel from source distribution...
Successfully built dist/mypack-0.1.0.tar.gz and dist/mypack-0.1.0-py3-none-any.whl
> uv build
Building source distribution...
Building wheel from source distribution...
Successfully built dist/mypack-0.1.0.tar.gz and dist/mypack-0.1.0-py3-none-any.whl
dist/mypack-0.1.0-py3-none-any.whlis a build distribution. pip prefers to install build distributions and only uses the source distributions if no suitable build distribution is available. You should always upload a source distribution and provide build distributions for the platforms with which your project is compatible. In this case, our example package is compatible with Python on every platform, so only one build distribution is required:
mypackis the normalised package name
0.1.0is the version of the distribution package
py3specifies the Python version and, if applicable, the C-ABI
nonespecifies whether the Wheel package is suitable for any OS or only specific ones
anyanyis suitable for any processor architecture,x86_64on the other hand only for chips with the x86 instruction set and a 64-bit architecture
See also
mypack-0.1.0.tar.gzis a source distribution
Testing¶
You can then check the Wheel file with:
$ uv add --dev check-wheel-contents
Resolved 17 packages in 8ms
Built mypack @ file:///Users/veit/sandbox/mypack
Prepared 1 package in 442ms
Uninstalled 1 package in 0.89ms
Installed 10 packages in 5ms
+ annotated-types==0.7.0
+ attrs==24.2.0
+ check-wheel-contents==0.6.0
+ click==8.1.7
~ mypack==0.1.0 (from file:///Users/veit/sandbox/mypack)
+ packaging==24.1
+ pydantic==2.9.2
+ pydantic-core==2.23.4
+ typing-extensions==4.12.2
+ wheel-filename==1.4.1
$ uv run check-wheel-contents dist/*.whl
dist/dataprep-0.1.0-py3-none-any.whl: OK
Alternatively, you can also install the package in a new project, for example in
myapp:
$ uv init --app myapp
$ cd myapp
$ uv add ../mypack/dist/mypack-0.1.0-py3-none-any.whl
Resolved 8 packages in 130ms
Installed 1 package in 3ms
+ mypack==0.1.0 (from file:///Users/veit/sandbox/mypack/dist/mypack-0.1.0-py3-none-any.whl)
You can then call mypack with uv run:
$ uv run mypack
Hello from mypack!
See also
`Troubleshooting build failures
<https://docs.astral.sh/uv/reference/troubleshooting/build-failures/>`_
Note
There are still many instructions that include a step to call
setup.py, for example python setup.py sdist. However, this is
now considered anti-pattern by parts of the
Python Packaging Authority (PyPA).
Checks¶
If you want to create a task management package that writes the tasks to a database and provides them via a Python API and a command line interface (CLI), how would you structure the files?
Think about how you want to fulfil the above tasks. Which libraries and modules can you think of that could fulfil this task? Sketch the code for the modules of the Python API, the command line interface and the database connection.