Hey folks, we recently upgraded our Linux machines – RHEL5, x86_64 – to the CUDA 2.3 toolkit and 190.18 Beta drivers. One of our users reports problems with his code after the upgrade. Essentially, his code which is supposed to run on a single device ends up running on both of the GPUs on his machine. When he runs two programs which should each run independently on the two different GPUs, he sees them both running on both GPUs and a signficant performance decrease. Moving back to the non-beta drivers and CUDA 2.2 toolkit seems to resolve the problem. Any thoughts? Is this a beta driver issue perhaps? Or has there been a change in CUDA 2.3 that requires him to update his code to run properly? At this point we’re considering rolling back to CUDA 2.2 / stable driver base.
His comments below. Any help/advice appreciated.
When I run a CUDA program, I get pretty much the performance I expect. When I run two CUDA programs at the same time, it takes each program roughly three times as long.
I did some digging, and I think I know a symptom of the problem. I ran one program, then I ran the command:
lsof | grep nvidi | grep mem
Below is the output.
[samcho@host:~/sop/30s-od]: lsof | grep nvidi | grep mem
sop.x 28034 username mem CHR 195,0 7021 /dev/nvidia0
sop.x 28034 username mem CHR 195,1 7039 /dev/nvidia1
When I ran two programs at the same time, I got the following output:
[username@host:~/sop/30s-od]: lsof | grep nvidi | grep mem
sop.x 28043 username mem CHR 195,0 7021 /dev/nvidia0
sop.x 28043 username mem CHR 195,1 7039 /dev/nvidia1
sop.x 28044 username mem CHR 195,0 7021 /dev/nvidia0
sop.x 28044 username mem CHR 195,1 7039 /dev/nvidia1
Clearly, each program is using both cards at the same time, even though I specify they use only a single one.