pyCUDA best practice to keep C++ kernels separate from python code

dag.lindbo · April 10, 2019, 2:56pm

Hi all!

Sorry if this is a common beginners question, but I’d love to get the community view on how to use pyCuda in a context where writing the kernel source (c++) code in a python string is not viable.

Basically, my team is looking for a clean way to migrate test cases and development flows to be python-based, but still code kernels in C++ for inclusion into production environments that are pure C++ cuda/dpdk further down the CI/CD pipeline. (Also, we do love to see that C++ code in an IDE, as usual…)

Ideally, I’d like kernels in separate .cu source files that are built with nvcc prior to spawning the python3 runtime. Haven’t found an example with this setup – tutorials and examples all seem to have pycuda.compiler.SourceModule take a string and run nvcc from within the python process, rather than taking a path to a compiled cuda object file.

Now, I guess the other option is to put all kernels in headers and pass them via include-path to SourceModule… Is this the preferred option, or is there some other way?

Cheers! /Dag

kkraus · April 11, 2019, 10:22pm

Hi Dag,

If your kernels and C++ are precompiled into a shared library, you can write Python bindings to the C++ functions through a variety of different technologies. For example in RAPIDS cudf (https://github.com/rapidsai/cudf):

We define all of our CUDA and C++ and compile it as you would normally compile via nvcc / gcc / g++: https://github.com/rapidsai/cudf/tree/branch-0.7/cpp
We then write Python bindings to these functions using Cython: https://github.com/rapidsai/cudf/tree/branch-0.7/python/cudf/bindings

There’s numerous alternatives to Cython such as ctypes (https://docs.python.org/3/library/ctypes.html), cffi (https://cffi.readthedocs.io/en/latest/), pybind11 (https://pybind11.readthedocs.io/en/stable/), etc. that can all be used to bind Python to C/C++ functions.

Does that answer your question?

dag.lindbo · April 12, 2019, 1:37pm

Cython is familiar and makes sense on a basic level. Thanks!

It’s maybe a bit secondary, but could Cython really be used with pyCUDA??

What I had in mind was instead of:

from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

multiply_them = mod.get_function("multiply_them")

we would have something like

from pycuda.compiler import BinaryModule
mod = BinaryModule("foo.cubin")
multiply_them = mod.get_function("multiply_them")

Clearly, some parts are missing here, even though the basic linkage mechanism is the same …

If this isn’t under consideration we’ll write a custom wrapper with Cython for our testcase driver. No problem!

/Dag

Robert_Crovella · April 12, 2019, 2:05pm

No, you can’t use Cython to directly load and access a cubin. You would need some additional driver API code (at a minimum, calls into the CUDA driver API library, which could probably be done using python ctypes/cython). That is why the previous response said “If your kernels and C++ are precompiled into a shared library”. If you create a shared library that is used in a CUDA program, by definition there is no device code linking (or, basically, exposure) over the interface.

pycuda is built on the CUDA driver API. With a bit of effort you should be able to intermix calls to the driver API for anything that pycuda can’t do directly. If you do this yourself, then of course you’ll need to integrate some C++ host code (calls) to call the CUDA driver API (library), because that is a C++ (host) library. There are no python bindings for it directly that I know of. However, pycuda is pretty well engineered, in my opinion, and if you poke around in pycuda, I wouldn’t be surprised if you find effectively bindings for the things you need, i.e. load a cubin, extract the kernel, etc. For instance, see here:

[url]https://documen.tician.de/pycuda/driver.html[/url]

check the “Code on the Device: Modules and Functions” section.

pycuda isn’t a NVIDIA product. It’s created and maintained by someone else. So asking “if this isn’t under consideration” on these forums might not make sense, if you are looking for changes to pycuda.

dag.lindbo · April 12, 2019, 2:15pm

Ok, sounds good. Sorry for suggesting you consider this as a feature – I found pyCUDA via Nvida dev pages and didn’t really think about who maintains it. /d

czero69 · June 21, 2021, 1:36pm

As Robert Crovella mentioned, it seems pycuda has a nice interface to load Cubin.

For me the refactoring from python string (pycuda compilation) to nvcc compilation and loading cubin was as follows:

copying code to external myKernel.cu file # note: I have to use extern “C” for pycuda reasons

#include <cuda_fp16.h>
extern "C"{
    __global__ void my_kernel(...)
}

compiling to cubin with nvcc # for arch_sm on your hardware, see nvidia docs. To avoid pycuda error you have to specify arch_sm.

nvcc -Xptxas -O3,-v -arch=sm_75 -cubin ./cuda_kernels.cu

refactoring the code and use pycuda interface to load cubins

import pycuda.driver as cuda

    def load_kernels(self, cubin_file):
        mod = cuda.module_from_file(cubin_file)
        func_my_kernel = mod.get_function("my_kernel")
        return func_my_kernel

Topic		Replies	Views
Interfacing a running C/Cuda program and a python service CUDA Programming and Performance	1	766	July 23, 2009
CPP Module Object? CUDA Programming and Performance	0	3049	November 19, 2008
How to compile CUDA part to binary and store it in C array for further calls? CUDA Programming and Performance	10	2812	July 20, 2009
Using driver API to launch kernels A way to avoid external cubins? CUDA Programming and Performance	7	10626	August 14, 2008
How to pass a python function to CUDA C code ? CUDA Programming and Performance	0	656	September 1, 2016
How people are devloping applications in C and CUDA? integrating java and CUDA-C CUDA Programming and Performance	3	7011	August 7, 2009
Compiling cuda for python CUDA Programming and Performance	7	35050	December 30, 2007
Announcement: PyCuda CUDA Programming and Performance	26	27351	May 27, 2009
Module support in the runtime API CUDA Programming and Performance	0	4324	January 30, 2009
CUDA - C# interop CUDA Programming and Performance	13	25189	September 17, 2010

pyCUDA best practice to keep C++ kernels separate from python code

Related topics