Pybind11 demystified. Ch 2: Hack Python built-in modules
Writing Python extensions is undoubtedly one way of enhancing Python, but you can also fork the repo and build your version of the Python interpreter.
In practice, it’s probably infrequent that you need to build Python yourself as a user. But it’s good to play with it to get familiar with the Python code base and understand what Python C API is doing underneath. This learning experience will undoubtedly help you write better extension code and give you more confidence when judging the correctness of your code.
Build Python
git clone git@github.com:python/cpython.git --depth=1
cd cpython
mkdir debug # Built artifacts are stored in the debug folder
cd debug
../configure --with-pydebug --enable-optimizations --with-lto
make -j8 # change 8 to the number of CPUs on your workstation
Once built, you can find an executable file named python
in the current path (under debug
folder).
Update the implementation of a Python built-in module
Now let’s tweak the dictionary module by changing its documentation string defined in this C variable.
# You should be in the `debug` folder# Edit Objects/dictobject.c and update `dictionary_doc`
vim ../Objects/dictobject.cmake -j8 build_all
./python
>>> print(dict.__doc__) <- You should be able to see the new string
Congratulations! You’ve just made the first change to the Python interpreter!
Tailoring built-in Python modules
We can also remove specific modules by updating the Setup
configuration file.
# You should be in the `debug` folder# Comment out the last line (pwd module) by prepending #
vi Modules/Setup.bootstrap# Rebuild Python
make -j8 build_all# Now let's verify that the pwd module no longer exists
./python
>>> import pwd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pwd'
Adding a new built-in module
The CPython code base already has a comprehensive example of a built-in module called xxmodule.c (Note that this file is in the parent folder, not the debug
folder). This example can also be used as a template when writing new modules.
Let’s enable the xxmodule
so you can see what’s going on. I highly recommend you read through its source code as pybind11 is a nice C++ wrapper for these C codes and you’ll find out how the wrapper code was designed later in the series.
# Still in the `debug` folder# Enable xxmodule by adding a new line with content `xx xxmodule.c`.
# Make sure to finish this file with an empty space
vi Modules/Setup.local# Rebuild Python
make -j8 build_all./python
>>> import xx
>>> xx.Str('hi')
'hi'
>>> xx.bug([1,2,3])
1
Quiz: After reading the source code, do you understand what’s the bug in the bug
method?
Answer: This is a very subtle bug related to reference counting. Check here for an excellent explanation. And below are some example code that triggers this error:
>>> class ValueA:
... def __del__(self):
... print("A is removed\n")
...
>>> class ValueB:
... def __del__(self):
... print("B is removed\n")
... global input
... del input[0]
...
>>> input = [ValueA(), ValueB()]
>>> xx.bug(input)
B is removedA is removed<refcnt -2459565876494606883 at 0x7f8b47e61c30> <- weird output>>> input
[0]
>>> input = [ValueA(), ValueB()]
>>> input
[<__main__.ValueA object at 0x7f8b47b4fbc0>, <__main__.ValueB object at 0x7f8b47b4edb0>]
>>> input[0]
<__main__.ValueA object at 0x7f8b47b4fbc0> <- expected output
If you’re interested, there are other examples under the same folder, such as xxlimited.c and xxsubtype.c, for your reference.