Kernel doesn't work without CUDA_PROFILE weird problem

Hi,

I just encountered a weird problem.
I have build a solution with a couple of kernels that do image processing stuff.
The kernels work perfectly, but ONLY when CUDA_PROFILE is set to 1 in the environmental variables.

If I set it to 0 nothing works.
Did anybody experience the same problem?

Now if I want to run the .exe with all the necessary dlls on different systems with the 169.21 display driver installed it only works when the toolkit is installed and CUDA_PROFILE is set to 1.

I just don’t want to install the toolkit on every system.

Any help is appreciated!
thanks!

One side effect of CUDA_PROFILE is that kernel launches are done synchronously. Is it possible your code relies on that side effect?

ok, so if i set CUDA_PROFILE to 1 all kernels are being launched synchronously? I didn’t know that, thanks!

in my code each kernel call is wrapped in a C++ function
and each function is executed one after another.
maybe i should put some cudaThreadSynchronize() in the code.

I’ll try that!

@nwilt

thanks for the hint! where did you get that from?
cudaThreadSynchronize() was the solution and everything is just fine!

case closed…

Well, he is from NVIDIA ;)