I’m trying to add some processing functionality to the Cuda Video Decoder example (cudaDecodeGL in the Cuda Samples directory of the Cuda 8.0 install). I’m new to the use of ptx files and the module manager and such (I learned how to define and launch kernels with the approach described in the ‘Cuda By Example’ book). Anyhow, I believe the process for creating and launching a new kernel is as follows. (I’m using the existing xxxx.cu file to add my code):
- Create the cuda kernel in the NV12ToARGB_drvapi.cu file. For example, I just added the lines at the bottom:
__global void kernel(void)
{}
… certainly a NOP kernel - but a kernel nontheless…
Then, in the videoDecodeGL.cpp file:
2. Defined a CUfunction handle for my kernel:
CUfunction g_mykernel = 0;
- There already is a line of code that defines the module manager:
g_pCudaModule = new CUmoduleManager("NV12ToARGB_drvapi64.ptx", exec_path, 2, 2, 2);
- Then added the line
g_pCudaModule->GetCudaFunction("kernel", &g_mykernel);
However, when I add this last line of code and run the program, it crashes when it gets to this line with the error:
Exception thrown at 0x00007FF613174017 in my_cudaDecodeGL_v2.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.
If I comment out this line, the program runs normally.
I’m working on Windows 10 in Visual Studio 2015. I’m pretty sure (well - I’m actually just assuming) that the *.ptx compiled version of my *.cu file is getting compiled along with the overall code. Is that right? Or do I have to perform some intermediate and specialized nvcc compilation of the *.cu file before compiling/running the main program?
Clearly, I’m not telling the system how to add a new kernel in the correct way. However, I’m having a hard time finding resources online to explain what to do and help me figure what I’m doing wrong here. Can anyone help?