Kernel doesn't work without CUDA_PROFILE weird problem


I just encountered a weird problem.
I have build a solution with a couple of kernels that do image processing stuff.
The kernels work perfectly, but ONLY when CUDA_PROFILE is set to 1 in the environmental variables.

If I set it to 0 nothing works.
Did anybody experience the same problem?

Now if I want to run the .exe with all the necessary dlls on different systems with the 169.21 display driver installed it only works when the toolkit is installed and CUDA_PROFILE is set to 1.

I just don’t want to install the toolkit on every system.

Any help is appreciated!

One side effect of CUDA_PROFILE is that kernel launches are done synchronously. Is it possible your code relies on that side effect?

ok, so if i set CUDA_PROFILE to 1 all kernels are being launched synchronously? I didn’t know that, thanks!

in my code each kernel call is wrapped in a C++ function
and each function is executed one after another.
maybe i should put some cudaThreadSynchronize() in the code.

I’ll try that!


thanks for the hint! where did you get that from?
cudaThreadSynchronize() was the solution and everything is just fine!

case closed…

Well, he is from NVIDIA ;)