Very strange problem. Different behavior on different device numbers.

Hello,

The last couple days, I’ve been struggling with this strange problem. I have two Geforce 590’s installed on my work machine. Since each 590 is a dual GPU, it is as if I have 4 GPU devices total.

The problem is, if I have cudaSetDevice(n) for n=1,2 or 3, I get different behavior than if I set n=0. More specifically, I have a thrust::device_vector called zIndices_d. I first initialize a thrust::host_vector zIndices_h, setting each element (total size 320) in a loop, and then set

zIndices_d=zIndices_h;

However, if I try to use it in a cuda kernel, it will always throw an out of bounds exception, even when looking at zIndices_d[0], when debugging with nsight, but ONLY if the device is set to 1, 2, or 3. It works perfectly if it is set to 0.

Additionally, I should be able to do something like this, in host code:
int temp = int4(zIndices_d[0]).x;
cout << temp << endl;

(Note that since zIndices_d is a thrust vector in host code, elements can be accessed in this way without tedious copying to/from device memory.)

However, when the device is 1,2, or 3, the program just completely crashes here, claiming there is an out of bounds error. On device 0, it works fine.

Also, if I instead try

int temp = int4(zIndices_d[72]).x;
cout << temp << endl;

It no longer crashes, but it gives the wrong answer in devices 1, 2, and 3. Obviously device 0 still works fine.

I’ve tried this program on a different machine with a Geforce 560, it works fine.

I’ve tried updating the drivers, reinstalling CUDA, reinstalling nsight, restarting the computer many times, nothing fixes the problem.

I am running Windows 7, 16 GB RAM, Intel i7 processor. Using Visual Studio for compiling and debugging.

Anyone have any thoughts?

Thanks!

Do you get the same behavior without using Thrust on that particular machine w/ the 2 GTX 590s? If so, it would sound like it would be a bug in Thrust w/ multi-GPUs, but that’s just a guess. Try CUDA 4.2 if you’re using CUDA 5 or vice-versa and see if you’re able to replicate the issue on that machine.

Edit: http://stackoverflow.com/questions/8289860/multiple-gpus-with-cuda-thrust in particular: “Just keep in mind that you will need to create and operate on separate vectors on each device” might be the issue you’re encountering.

I see, thanks! It turns out that I was calling cudaSetDevice AFTER the thrust vectors were instantiated. Switching the order fixed the problem.