Pybind11 demystified. Ch 1: Dynamic library loading

Xianbo QIAN
3 min readAug 27, 2022

--

Writing Python code is a lot of fun but making it fast is sometimes painful. The most common way to boost performance is to identify hotspots and migrate them to C / C++ using the Python extension.

Writing Python extensions is not always an easy task to do. One thing that blows my mind very often is memory management — not only do I need to take care of allocating objects in C++, but also in Python.

Luckily pybind11 provides a set of compelling interfaces helping to reduce the cognitive burden. But from time to time, while I enjoy its high-level interface's simplicity, I wonder what's happening on the lower level, namely how pybind translates my templated C++ code to the appropriate Python C API call.

So I started looking for a pybind11 internal or a technical paper on this topic. But none of the materials I found are deep enough to cover the design and implementation details. I spent quite some time reading the source code and put together some of my study notes here. Hopefully it's helpful to you as well.

Dynamic library loading

Before jumping into the details of pybind11, let's first look at some basic Linux concept that makes loading a Python extension possible. This concept also applies to Mac and Windows with minor differences.

Unlike Python, extension code written in C must be compiled and linked before it's usable. The compile step is handled by the compiler (such as gcc or clang) to generate object files with translated machine code from the .csource files. After compilation, each function converts to a symbol in the object file. You can use thenm tool to list them. The linker (such as ld ) combines the object files to generate output in a different format, such as binary that executes directly or library loaded during process startup.

There are three common types of linkage: static, shared (or dynamic link), and dynamic (or dynamic load). This article gives a great explanation of the overall differences. If you want to find even more information, this complete guide of runtime linker, written from an OS perspective, is worth reading. But for simplicity,

  • static: all symbols are combined in a single file.
    Python built-in modules are compiled statically in the released Python interpreter.
    This dictionary module PyDict_Type is a good example. You can verify it with
    $ nm YOUR_PYTHON_LIBRARY.so | grep PyDict_Type
  • shared: the final binary file or a shared library can also depend on one or many shared libraries.
    In this approach, some symbols are defined in the binary file while others are in the shared library. Upon execution of the program, a runtime linker will recursively locate all the dependent libraries and relocate a unique copy of each library to generate the complete process image, before passing the control to the application (e.g., calling the main function).
    For example, the CUDA library is often loaded as a shared library in ML frameworks. The libc code used by the Python interpreter is another example. You can check this with either readelf or ldd :
    $ readelf -a YOUR_PYTHON_BINARY | grep libc
  • dynamic: either static or shared library mentioned above needs to be known at compile time. But the Python extension code you write is unknown to people who built the Python interpreter, so the extension code requires a different linkage mode that offers more flexibility.
    The dynamic library allows loading symbols at runtime with a generated string with the help of dl_open. Developers need to be highly cautious to get the name and typing right. Otherwise, the program could fail with a hard segment fault error.

To clarify: shared library and dynamic library are built in the same as .so file, but the way the symbols are loaded to the memory is different. Shared library is loaded upon process startup while dynamic library is loaded on demand when dl_open is called.

Now let's see some concrete examples of how each of them works.

You can also clone the code from this gist. https://gist.github.com/xianbaoqian/10dae2219db91122bac981263ea0c27c

>> Chapter 2: Hack Python built-in modules

--

--