I started to try to implement some cuda processing with the runtime compilation library for live compiling. The stuff I do is image processing with heavy parts of the code dependent on user input so I thought a short compilation before processing a load of images is the best way.
But now I have a problem executing the code as cuLaunchKernel results in a CUDA_ERROR_INVALID_VALUE.
I have packed my experiments into a sample single file demo code pasted to https://bpaste.net/show/a90b0436770d
It compiles on my Mac OS Yosemite 10.10.5 system, CUDA 7.5, clang/llvm version 7.0.2 with
clang++ nvrtc_test_single.cpp -o cudartctest-single -I $CUDA_PATH/include -L $CUDA_PATH/lib -lnvrtc -lcuda -lcudart -F/Library/Frameworks -framework CUDA -Wl,-rpath,$CUDA_PATH/lib
Result output is
Using CUDA device : GeForce GTX 750 CUDA init - time: 87.081001 ms Fileinfo: Width=1920, Height=1080 CUDA CUDA - Memory-Prep - time: 9.111000 ms nvrtcProgramLog: CUDA - Kernel RTC - time: 872.619019 ms Grid dimensions: 15 x 1080 error: cuLaunchKernel( kernel, CUDA_X_DIM, 1, 1, grid_dim_x, rgb->height, 1, 0, NULL, args, NULL) failed with error CUDA_ERROR_INVALID_VALUE
So I’d expect that the problem is either in line 192 or in 198 but I don’t really get what’s the problem as it should match the sample from http://docs.nvidia.com/cuda/nvrtc/index.html with only slight modifications.
(The saxpy sample from the nvrtc docu just works fine as is. Same compiler options.)
Does anyone have an idea what the problem is? Searched for over a day but can’t really find helpful information on how to debug this.