Pybind11 demystified. Ch 4.1: positional only args

Xianbo QIAN
4 min readSep 4, 2022

Python doesn't support function overloading as it is in C++ (different functions signature with the same function name). Still, Python does provide a set of handful syntax, including default values and kwargs, to allow mutating the behavior of a function based on the arguments.

This chapter will explore different ways of handling Python-style function overloading in Python extensions, and how to write C extensions for them.

Ch 4.1 positional only args

Depending on if the extension function uses any positional arguments, it will be called with either METH_VARARGS or METH_VARARGS | METH_KEYWORDS convensions. Details can be found here.

In this chapter, we'll focus on functions with positional only arguments, i.e. METH_VARARGS convention.

Let's have a look on our xxmodule.c

static PyMethodDef xx_methods[] = {
{"roj", xx_roj, METH_VARARGS,
PyDoc_STR("roj(a,b) -> None")},
{"foo", xx_foo, METH_VARARGS,
xx_foo_doc},
{"new", xx_new, METH_VARARGS,
PyDoc_STR("new() -> new Xx object")},
{"bug", xx_bug, METH_VARARGS,
PyDoc_STR("bug(o) -> None")},
{NULL, NULL} /* sentinel */
};

This list defines all the modules' functions, and let's add a new one by extending the list. In my case, I call it xx_argAnd the skeleton implementation is the following.

static PyObject *
xx_arg(PyObject *self, PyObject *args)
{
Py_IncRef(Py_None);
return Py_None;
}

Note that whatever is returned from the C extension function will be captured by a variable, and its reference count will decrease when the capture variable goes out of scope. So we need to systematically increase the reference count by 1 of returning value to avoid a negative reference count. The same applies to None, a Python object defined here.

Instead of returning None, we can also return args to see what it is.

static PyObject *
xx_arg(PyObject *self, PyObject *args)
{
Py_IncRef(args);
return args;
}

Then in Python:

>>> import xx
>>> print(xx.arg(9.,2.5))
(9.0, 2.5)

OK. args is just a tuple of the input arguments, basically *args.

Or we can also check the type of a PyObject, with Py*_Check methods, e.g. PyTuple_Check

Another way of inspecting variable type in the C extension is to leverage PyTypeObject.

printf(“Type: %s\n”, Py_TYPE(args)->tp_name);

Or even better, with PyObject_Print, you can print the object's string representation.

Once we're sure that the variable is an instance of PyTuple, we can check its size by PyTuple_Size . We can also get its i-th children withPyTuple_GET_ITEM

Putting all these together, we can have the second version as the following:

static PyObject *
xx_arg(PyObject *self, PyObject *args)
{

assert(PyTuple_Check(args));
Py_ssize_t count = PyTuple_Size(args);
for (int i = 0; i < count; i++)
{
printf("Type: %s\n", Py_TYPE(element)->tp_name);
}
Py_IncRef(Py_None);
return Py_None;
}

That looks great. But it's pretty lengthy; It would be great if I could define a function with C/C++ typed functions instead of manually converting each PyObject in the tuple to its corresponding type, which is precisely what Pybind11 did, with complicated C++ template stuff.

Let's do a minimal re-implementation to make things clear.

The first thing that we need to do is to change xxmodule.c to xxmodule.cc and use g++ it instead of GCC as the compiler.

Then we need to define the actual function body in C/C++ style.

int func(int a, float b)
{
return a + b + 1;
}

Finally, we need to do the hard part of converting PyObject* elements from the tuple to its corresponding C types.

For each type, let's define a Process function to transform PyObject* to the required type. Here is my demo implementation.

template <typename Arg>
Arg Process(PyObject *op);
template <>
int Process<int>(PyObject *op)
{
int v = _PyLong_AsInt(op);
printf("Process int: %d\n", v);
return v;
}
template <>
long Process<long>(PyObject *op)
{
long v = PyLong_AsLong(op);
printf("Process long: %ld\n", v);
return v;
}
template <>
float Process<float>(PyObject *op)
{
float v = PyFloat_AsDouble(op);
printf("Process float: %lf", v);
return 2.;
}

Let's then put each processed PyObject into a tuple and pass that to the target f function. C++ 17 provided std::apply is very handy for this work. Here is the complete code.

template <typename Return, typename... Args, std::size_t... Idx>
Return _expand(Return (*f)(Args...), PyObject *tuple, std::index_sequence<Idx...>)
{
return std::apply(f, std::make_tuple(Process<Args>(PyTuple_GetItem(tuple, Idx))...));
}
template <typename Return, typename... Args>
Return expand(Return (*f)(Args...), PyObject *tuple)
{
return _expand(f, tuple, std::make_index_sequence<sizeof...(Args)>());
}

It seems a bit complicated to understand at first sight, but in fact, it's pretty straightforward.

std::make_index_sequence<sizeof…(Args)>() returns a sequence of 0, 1, 2, …, N, where N is the number of arguments in the target f function.

std::make_tuple(Process<Args>(PyTuple_GetItem(tuple, Idx))…) means that we're creating a tuple of processed elements. The general rule of thumb for … is to repeat the statement right before the dots for each value in the parameter pack. The expanded form is similar to the following.

std::make_tuple(
Process<Arg0>(PyTuple_GetItem(tuple, 0)),
Process<Arg1>(PyTuple_GetItem(tuple, 1)),
Process<Arg2>(PyTuple_GetItem(tuple, 2)),
...
Process<ArgN>(PyTuple_GetItem(tuple, N)))

std::apply extends this tuple and applies that to f, i.e.

f(
Process<Arg0>(PyTuple_GetItem(tuple, 0)),
Process<Arg1>(PyTuple_GetItem(tuple, 1)),
Process<Arg2>(PyTuple_GetItem(tuple, 2)),
...
Process<ArgN>(PyTuple_GetItem(tuple, N)))

Pretty cool. Isn't it?

But we're missing something… How about type checking? What if the passed arguments are incompatible with the required C types?

Let's leave the type checking as an exercise for you. With the above example, you should be able to write that code.

(Don't turn the page until you have finished the exercise.)

Here is a reference implementation. I'm sure you can get a better version of that, with more meaningful error reporting (please post your version in the comment!), for example.

template <typename Arg>
bool Check(PyObject *)
{
return false;
}
template <>
bool Check<long>(PyObject *op)
{
return PyLong_CheckExact(op);
}
template<>
bool Check<float>(PyObject* op) {
return PyFloat_CheckExact(op);
}
template <typename Return, typename... Args>
Return expand(Return (*f)(Args...), PyObject *tuple)
{
return _expand(f, tuple, std::make_index_sequence<sizeof...(Args)>());
}
template <typename Return, typename... Args, std::size_t... Idx>
Return _expand(Return (*f)(Args...), PyObject *tuple, std::index_sequence<Idx...>)
{
auto checks = std::initializer_list {Check<Args>(PyTuple_GetItem(tuple, Idx))...};
for (int i = 0; i < checks.size(); i++)
{
if (!*(checks.begin() + i))
{
printf("Found issue with %d arg\n", i);
return Return();
}
}

return std::apply(f, std::make_tuple(Process<Args>(PyTuple_GetItem(tuple, Idx))...));
}

OK. That's so much for this chapter. Now give it a try by calling expand(YOUR_TARGET_FUNCTION, ARGS_TUPLE) in your new created function.

You might have noticed that this call conversion requires an intermediate Python tuple, whereas it could have just used a low-level C-style array instead. https://bugs.python.org/issue29259 proposed a new conversion called fast call. We'll cover that shortly.

Ch 4.2: TBC

--

--