Hello,
The last couple days, I’ve been struggling with this strange problem. I have two Geforce 590’s installed on my work machine. Since each 590 is a dual GPU, it is as if I have 4 GPU devices total.
The problem is, if I have cudaSetDevice(n) for n=1,2 or 3, I get different behavior than if I set n=0. More specifically, I have a thrust::device_vector called zIndices_d. I first initialize a thrust::host_vector zIndices_h, setting each element (total size 320) in a loop, and then set
zIndices_d=zIndices_h;
However, if I try to use it in a cuda kernel, it will always throw an out of bounds exception, even when looking at zIndices_d[0], when debugging with nsight, but ONLY if the device is set to 1, 2, or 3. It works perfectly if it is set to 0.
Additionally, I should be able to do something like this, in host code:
int temp = int4(zIndices_d[0]).x;
cout << temp << endl;
(Note that since zIndices_d is a thrust vector in host code, elements can be accessed in this way without tedious copying to/from device memory.)
However, when the device is 1,2, or 3, the program just completely crashes here, claiming there is an out of bounds error. On device 0, it works fine.
Also, if I instead try
int temp = int4(zIndices_d[72]).x;
cout << temp << endl;
It no longer crashes, but it gives the wrong answer in devices 1, 2, and 3. Obviously device 0 still works fine.
I’ve tried this program on a different machine with a Geforce 560, it works fine.
I’ve tried updating the drivers, reinstalling CUDA, reinstalling nsight, restarting the computer many times, nothing fixes the problem.
I am running Windows 7, 16 GB RAM, Intel i7 processor. Using Visual Studio for compiling and debugging.
Anyone have any thoughts?
Thanks!