CUDA in Python C/C++ extensions

I want to write a Python extension with C++ (1. Extending Python with C or C++ — Python 3.9.5 documentation), but I want the C++ extension to use CUDA. There aren’t many how-to’s on this online, and the ones I’ve found are fragmented and very dated. I’m not sure if this pathway to using CUDA is fully supported, and what the implications are. Using generic Python bindings for CUDA like Numba etc are not an option.

Would greatly appreciate if someone can give a breakdown of using CUDA in Python C/C++ extensions.

I don’t have any references for python extensions (you will probably find some with google searches). However at the beginning of the writeup you linked, it also suggests that there may be good reasons to use either python ctypes or Cython.

The computing system cupy is built using Cython. So there is a large body of worked examples there. And a nice introductory blog here, although that is mainly showing cython, not CUDA. Here is a cython/cuda example

Python ctypes is very easy to use, also. Here is a full example of using ctypes with CUDA.

To mix CUDA/C++/Python, check out how CuPy accomplished this with NVRTC