I have been trying to call a Cuda function from python following the tutorial of CUDA Python here: Overview - CUDA Python 11.6.0 documentation.
The example looks like:
from cuda import cuda, nvrtc
import numpy as np
# the Cuda program to be called
saxpy = """\
extern "C" __global__
void saxpy(float a, float *x, float *y, float *out, size_t n)
{
size_t tid = blockIdx.x * blockDim.x + threadIdx.x;
if (tid < n) {
out[tid] = a * x[tid] + y[tid];
}
}
"""
# Create program
err, prog = nvrtc.nvrtcCreateProgram(str.encode(saxpy), b"saxpy.cu", 0, [], [])
# Compile program
opts = [b"--fmad=false", b"--gpu-architecture=compute_75"]
err, = nvrtc.nvrtcCompileProgram(prog, 2, opts)
# Get PTX from compilation
err, ptxSize = nvrtc.nvrtcGetPTXSize(prog)
ptx = b" " * ptxSize
err, = nvrtc.nvrtcGetPTX(prog, ptx)
# ... more
However, the Cuda function I am trying to call is from a large project that involves several other project .cu/.cuh files, making it hard to be written in a string as shown in the example of the tutorial.
Is there a way to call the CUDA function more conveniently, or should I copy and paste all those .cu/.cuh files into a string as the example does?