I’m a bit confused. I’m working on a project for which I have a system with two GPUs: a Quadro 600 for display purposes and a GTX Titan for computing.
I want to store data in constant memory (of the Titan) using cudaMemcpyToSymbol and my code seems to work properly. That is: I can write data and read it back, the input data matches the output and Cuda doesn’t return any errors.
Later I found out I forgot to select one of the GPUs (the Titan), so I’m very surprised the code worked at all. Where does the data go?
Note that if you have several GPUs, you should not rely on the driver’s ordering to select the “best” card. On a system with 4 GPUs, I see this ordering:
My guess is that the driver tries to ensure that CUDA device 0 is not the display device when there is more than one device available, but the ordering beyond that is arbitrary.
I don’t recall seeing the device order change after switching versions of CUDA (this particular computer has had CUDA 4-5.5 on it), but given the lack of specification, it is probably best to assume that it could change.
In my code I have a couple of lines to ensure my calculations are performed on the Titan. Basically, I query the device properties and select the device named “Titan”.
Maybe I should look into using NVML.
What I find surprising is that there are no straightforward ways to select a device based on a unique serial number or something similar.
Also CUDA deciding what device is fastest and making it device 0 seems a bit arbitrary. What happens if I have for example a system with multiple Titans. Which of them will then become device 0? And is the enumeration the same everytime I start the system?
I think my example above shows that CUDA does not map the fastest card to device 0 in general. The GTX 680 (for many, but not all) applications is a better card than the GTX 580, but is device 2.
The device properties structure is pretty extensive, and should let you create a device selection heuristic appropriate for your application based on compute capability, memory size, # of CUDA cores, whether or not a display is connected, etc. I don’t think you’ll need to use NVML to pick a CUDA device.
There does not appear to be a unique card serial number in the device property structure, but it does have fields for the PCI Express “coordinates” (domain, bus, device) of the device, which should be stable as long as the card is not moved to a different slot in the computer.
You’re right about using the device properties to select the appropriate device. At the moment selection based on the device name suffices, but in the future I may have resort to using PCI BUS ID etc as well.
I’m curious, what parameter in the device properties structure indicates whether a display is connected to the device or not?
I agree with seibert that it is more probable that the device IDs are assigned according to a “physical location” of the device, instead to performance heuristics, opposite to the answer at
there is a code snippet to select the card with the largest number of multiprocessors, but also some CUDA SDK multi-gpu examples (p2p) have parts of the code to make such selection.
Today I have installed a PC with a Tesla C2050 card for computation and an old 8084 GS card for visualization, by switching their positions between the first two PCI-E slots. I have used deviceQuery and noticed that GPU 0 is always that in the first PCI slot and GPU 1 is always that in the second PCI slot. I do not know if this is a general rule, but it is a proof that, at least for my system, GPUs are numbered not according to their “power”, but to their positions.