CUDA 2.3 problems with multiple GPUs (using more than one for a single process)

Hey folks, we recently upgraded our Linux machines – RHEL5, x86_64 – to the CUDA 2.3 toolkit and 190.18 Beta drivers. One of our users reports problems with his code after the upgrade. Essentially, his code which is supposed to run on a single device ends up running on both of the GPUs on his machine. When he runs two programs which should each run independently on the two different GPUs, he sees them both running on both GPUs and a signficant performance decrease. Moving back to the non-beta drivers and CUDA 2.2 toolkit seems to resolve the problem. Any thoughts? Is this a beta driver issue perhaps? Or has there been a change in CUDA 2.3 that requires him to update his code to run properly? At this point we’re considering rolling back to CUDA 2.2 / stable driver base.

His comments below. Any help/advice appreciated.


When I run a CUDA program, I get pretty much the performance I expect. When I run two CUDA programs at the same time, it takes each program roughly three times as long.

I did some digging, and I think I know a symptom of the problem. I ran one program, then I ran the command:

lsof | grep nvidi | grep mem

Below is the output.

[codebox]

[samcho@host:~/sop/30s-od]: lsof | grep nvidi | grep mem

sop.x 28034 username mem CHR 195,0 7021 /dev/nvidia0

sop.x 28034 username mem CHR 195,1 7039 /dev/nvidia1

[/codebox]

When I ran two programs at the same time, I got the following output:

[codebox]

[username@host:~/sop/30s-od]: lsof | grep nvidi | grep mem

sop.x 28043 username mem CHR 195,0 7021 /dev/nvidia0

sop.x 28043 username mem CHR 195,1 7039 /dev/nvidia1

sop.x 28044 username mem CHR 195,0 7021 /dev/nvidia0

sop.x 28044 username mem CHR 195,1 7039 /dev/nvidia1

[/codebox]

Clearly, each program is using both cards at the same time, even though I specify they use only a single one.

No, each program is not using both cards at the same time. Each program opens /dev/nvidia0 and /dev/nvidia1 for device enumeration, but they’re certainly not using both cards or whatever. What’s more likely is that exclusive mode was running in the previous setup, and now both GPUs are using the same device.

Thanks for the response. I looked into this more deeply and the 2.2 installation was not setup in exclusive mode but does not exhibit the problem. We tested with 2.3 using Exclusive Mode and still have the problem. Other thoughts?

Are you running nvidia-smi in a background loop after enabling exclusive mode?