In this example, the kernel “adder” doesn’t work at all. All elements in “host_buff” are 0.
However, if I comment out “cudaSetDevice(0),” it works perfectly. Every element in “host_buff” is 1.
Does anyone know why “cudaSetDevice(0)” (kind of) disable the kernel?
I read cuda programming guide 4.0 and found that the default “current device” is 0.
But… well, even though I add a redundant “cudaSetDevice(0),” it’s still not matter right?
btw
I called cudaGetDeviceCount also. So I am very sure that I have 2 GPUs in my machine.
Check the return codes of all CUDA calls. Particularly of the cudaSetDevice() call that seems to be failing - that should tell you what is going wrong.
But I check the return code of cudaSetDevice(0). It is cudaSuccess.
I also use cudaGetDeviceProperties to check the compute modes of my devices.
Both of my GPU are in the default mode.
If I didn’t misunderstand the programmer guide, the default mode allows multiple CUDA context on the same device.
But, come back to my example, it is so simple that only 1 context on 1 device…
What’s the return code of the cudaMemcpy after the kernel call? I’m pretty sure CUDA doesn’t just fail without indicating an error (if it did, that would be a genuine bug and worth reporting).
What GPU is device #0 (run the device query example from the SDK to find out)? What compute capability are you compiling your code for? What are the permissions for the device? It’s pretty moot to speculate on the error though before knowing the exact error code/message.
OK. I cleaned my code and found that the error only happened when I change current device back and forth.
For example, in my host function:
==== OK host function ====
cudaSetDevice(0);
// do something
cudaSetDevice(1);
// do something
cudaMemcpyPeer(buff_on_1, 1, buff_on_0, 0, buff_size);
==========================
This host function works good.
However, if I change the host like this:
==== not OK host function ====
cudaSetDevice(0);
// do something
cudaSetDevice(1);
// do something
cudaSetDevice(0);
// do something
cudaSetDevice(1);
// do something
cudaMemcpyPeer(buff_on_1, 1, buff_on_0, 0, buff_size); → FAILED!!! with returning “cudaErrorInvalidValue”
==============================
This kernel works bad. The “cudaMemcpyPeer” returns “cudaErrorInvalidValue.”
However, I am very sure I passed the correct parameters.
I just move some tasks on device 1 earlier…
Does CUDA not allows us to switch too many times between devices?