Binary Extensions ================= One of the features of the CPython interpreter is that in addition to executing Python code, it also has a rich C API available for use by other software. One of the most common uses of this C API is to create importable C extensions that allow things that are difficult to achieve in pure Python code. Use Cases --------- The typical use cases for binary extensions can be divided into three categories: Accelerator modules These modules are stand-alone and are only created to run faster than the corresponding pure Python code. Ideally, the accelerator modules always have a Python equivalent that can be used as a fallback if the accelerated version is not available on a particular system. The CPython standard library uses many accelerator modules. Wrapper modules These modules are created to make existing C interfaces available in Python. You can either make the underlying C interfaces directly available or provide a *Pythonic* API that uses features of Python to make the API easier to use. The CPython standard library uses extensive wrapper modules. Low-level system access These modules are created to access functions of the CPython runtime environment, the operating system or the underlying hardware. With platform-specific code, things can be achieved that would not be possible with pure Python code. A number of CPython standard library modules are written in C to access interpreter internals that are not available at the language level. A particularly noteworthy property of C extensions is that they can release the Global Interpreter Lock (GIL) of CPython for long-running operations, regardless of whether these operations are CPU or IO-bound. Not all expansion modules fit exactly into the above categories. For example, the extension modules contained in `NumPy `_ cover all three use cases: * They move inner loops to C for speed reasons, * wrap external libraries in C, FORTRAN and other languages and * use low-level system interfaces of CPython and the underlying operating system to support the concurrent execution of vectorised operations and to precisely control the memory layout of objects created. Disadvantages ------------- In the past, the main disadvantage of using binary extensions was that they made it difficult to distribute the software. Today this disadvantage due to :term:`wheel` is hardly present. However, some disadvantages remain: * The installation from the sources remains complicated. * Possibly there is no suitable :term:`wheel` for the build of the CPython interpreter or alternative interpreters such as `PyPy `__, `IronPython `_ or `Jython `_. * The maintenance of the packages is more time-consuming because the maintainers not only have to be familiar with Python but also with another language and the CPython C API. In addition, the complexity increases if a Python fallback implementation is provided in addition to the binary extension. * Finally, import mechanisms, such as direct import from ZIP files, often do not work for extension modules. Alternatives ------------ … to accelerator modules ~~~~~~~~~~~~~~~~~~~~~~~~ If extensions modules are only used to make code run faster, a number of other alternatives should also be considered: * Looks for existing optimised alternatives. The CPython standard library contains a number of optimised data structures and algorithms, especially in the builtins and the modules ``collections`` and ``itertools``. Occasionally the :term:`Python Package Index` (:term:`PyPI`) also offers additional alternatives. Sometimes a third-party module can avoid the need to create your own accelerator module. * For long-running applications, the JIT-compiled `PyPy `__ interpreter can be a suitable alternative to the standard CPython. The main difficulty with adopting PyPy is typically the dependence on other Binary Extensions modules. While PyPy emulates the CPython C API, modules that rely on it cause problems for the PyPy JIT, and the emulation often exposes defects in extension modules that CPython tolerates. (often with reference counting errors). * `Cython `__ is a sophisticated static compiler that can compile most Python code into C-Extension modules. The initial compilation offers some speed increases (by bypassing the CPython interpreter level), and Cython’s optional static typing functions can provide additional speed increases. For Python programmers, Cython offers a lower barrier to entry relative to other languages such as C or C ++). However, using Cython has the disadvantage of adding complexity to the distribution of the resulting application. * `Numba `_ is a newer tool that uses the `LLVM compiler infrastructure `_ to selectively compile parts of a Python application to native machine code at runtime. It requires LLVM to be available on the system the code is running on. It can lead to considerable increases in speed, especially with vectorisable processes. … to wrapper modules ~~~~~~~~~~~~~~~~~~~~ The C-ABI (`Application Binary Interface `_) is a standard for the common use of functions between several applications. One of the strengths of the CPython C-API (`Application Programming Interface `_) is that Python users can take advantage of this functionality. However, manually wrapping modules is very tedious, so a number of other alternatives should be considered. The approaches described below do not simplify distribution, but they can significantly reduce the maintenance effort compared to wrapper modules. * `Cython `__ is useful not only for creating accelerator modules, but also for creating wrapper modules. Since the API still needs to be wrapped by hand, it is not a good choice when wrapping large APIs. * `cffi `_ is the project of some `PyPy `__ developers to give developers who already know both Python and C the possibility to make their C modules available for Python applications. It makes wrapping a C module based on its header files relatively easy, even if you are not familiar with C itself. One of the main advantages of cffi is that it is compatible with the PyPy JIT so that CFFI wrapper modules can fully participate in the PyPy tracing JIT optimisations. * `SWIG `_ is a wrapper interface generator that combines a variety of programming languages, including Python, with C and C ++ code. * The ``ctypes`` module of the standard library is useful to get access to C interfaces, but if the header information is not available, it suffers from the fact that it only works on the C ABI level and therefore no automatic consistency check between the exported Interface and the Python code. In contrast, the alternatives above can all work on the C API and use C header files to ensure consistency. * `pythoncapi_compat `_ can be used to write a C extension that supports multiple Python versions with a single code base. It consists of the header file :file:`pythoncapi_compat.h` and the script :file:`upgrade_pythoncapi.py`. … for low-level system access ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For applications that require low level system access, a binary extension is often the best option. This applies in particular to the low level access to the CPython runtime, since some operations (such as releasing the Global Interpreter Lock (GIL) are not permitted when the interpreter executes the code itself, especially when modules such as ``ctypes`` or ``cffi`` are used to Get access to the relevant C-API interfaces. In cases where the expansion module is manipulating the underlying operating system or hardware (instead of the CPython runtime), it is sometimes better to write a normal C library (or a library in another programming language such as C++ or Rust) that provides a C-compatible ABI) and then use one of the wrapping techniques described above to make the interface available as an importable Python module. Implementation -------------- We now want to extend our ``dataprep`` package and integrate some C code. For this we use `Cython `__ to translate the Python code from :download:`dataprep/src/dataprep/cymean.pyx` into optimised C code during the build process. Cython files have the suffix ``pyx`` and can contain both Python and C code. However, we cannot currently use ``hatchling.build`` as a build backend, but instead fall back on a current version of :term:`setuptools`: .. literalinclude:: dataprep/pyproject.toml :language: toml :lines: 19-22 :lineno-start: 19 :emphasize-lines: 2 The :term:`setuptools` use :download:`dataprep/setup.py` to include non-Python files in a package. .. literalinclude:: dataprep/setup.py :language: python .. note:: With `extensionlib `_ there is a toolkit for extension modules, which does not yet contain a ``hatchling`` plugin. .. note:: Alternatively, you could use :term:`Meson ` or :term:`scikit-build`: .. tab:: Meson .. code-block:: toml [build-system] requires = ["meson-python"] build-backend = "mesonpy" .. tab:: scikit-build .. code-block:: toml [build-system] requires = ["scikit-build-core"] build-backend = "scikit_build_core.build" Since Cython itself is a Python package, it can simply be added to the list of dependencies in the :download:`dataprep/pyproject.toml` file: .. literalinclude:: dataprep/pyproject.toml :language: toml :lines: 2 :lineno-start: 2 Now you can run the build process with the ``pyproject-build`` command and check whether the Cython file ends up in the package as expected: .. code-block:: console $ pyproject-build . * Creating venv isolated environment... * Installing packages in isolated environment... (cython, setuptools>=40.6.0, wheel) * Getting dependencies for sdist... Compiling src/dataprep/cymean.pyx because it changed. [1/1] Cythonizing src/dataprep/cymean.pyx … copying src/dataprep/cymean.c -> dataprep-0.1.0/src/dataprep copying src/dataprep/cymean.pyx -> dataprep-0.1.0/src/dataprep … running build_ext building 'dataprep.cymean' extension … Successfully built dataprep-0.1.0.tar.gz and dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl Finally, we can check our package with ``check-wheel-contents``: .. code-block:: console $ check-wheel-contents dataprep/dist/*.whl dataprep/dist/dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl: OK Alternatively, you can install our ``dataprep`` package and use ``mean``: .. code-block:: console $ python -m pip install dataprep/dist/dataprep-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl $ python .. code-block:: pycon >>> from dataprep.mean import mean >>> from random import randint >>> nums = [randint(1, 1_000) for _ in range(1_000_000)] >>> mean(nums) 500097.867198 With the ``random.randint`` function a tlist of one million random numbers with values between 1 and 1000 was created. .. seealso:: The `CPython Extending and Embedding guide `_ contains an introduction to writing your own extension modules in C: `Extending Python with C or C++ `_. However, note that this introduction only discusses the basic tools for creating extensions that are provided as part of CPython. Third-party tools such as `Cython `__, `cffi `_, `SWIG `__, and `Numba `__ offer both simpler and more sophisticated approaches to building C and C++ extensions for Python. `Python Packaging User Guide: Binary Extensions `_ not only covers various available tools that simplify the creation of binary extensions, but also explains the various reasons why creating an extension module might be desirable. Creating binary extensions -------------------------- Binary extensions for Windows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Before you can create a binary extension, you have to make sure that you have a suitable compiler available. On Windows, Visual C is used to create the official CPython interpreter, and it should also be used to create compatible binary extensions: For Python ≥ 3.5 install `Visual Studio Code `_ with `Python Extension `_ .. note:: Visual Studio is backwards compatible from Python 3.5, which means that any future version of Visual Studio can create Python extensions for all Python versions from version 3.5. Building with the recommended compiler on Windows ensures that a compatible C library is used throughout the Python process. Binary Extensions for Linux ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Linux binaries must use a sufficiently old glibc to be compatible with older distributions. `Distrowatch `_ prepares in table form which versions of the distributions deliver which library: * `Red Hat Enterprise Linux `_ * `Debian `_ * `Ubuntu `_ * … The `PYPA/Manylinux `_ project facilitates the distribution of Binary extensions as :term:`Wheels ` for most Linux platforms. This also resulted in :pep:`513`, which defines the ``manylinux1_x86_64`` and ``manylinux1_i686`` platform tags. Binary Extensions for Mac ~~~~~~~~~~~~~~~~~~~~~~~~~ Binary compatibility on macOS is determined by the target system for the minimal implementation, e.g. *10.9*, which is defined in the environment variable ``MACOSX_DEPLOYMENT_TARGET``. When creating with setuptools/distutils the deployment target is specified with the flag ``--plat-name``, for example ``macosx-10.9-x86_64``. For more information on deployment targets for Mac OS Python distributions, see the `MacPython Spinning Wheels-Wiki `_. Deployment of binary extensions ------------------------------- In the following, the deployment on the :term:`Python Package Index` (:term:`PyPI`) or another index will be described. .. note:: When deploying on Linux distributions, it should be noted that these make demands on the specific build system. Therefore, :term:`Source Distributions (sdist) ` should also be provided in addition to :term:`Wheels `. .. seealso:: * `Deploying Python applications `_ * `Supporting Windows using Appveyor `_