I have question about GPU selection when programming CUDA.
I am seeing some weird behavior with some CUDA code that I’m trying to debug (mostly not my code). In case it is relevant: I am primarily trying to sort out what I believe are some issues with concurrency in the code leading to deadlocks and incorrect results. In the interest of thoroughness I will also mention that this application is supposed to use multiple threads to do various calculations on the same GPU.
As part of my investigation, I ran nvprof like this:
nvprof --profile-child-processes --print-gpu-trace ./application
Looking at the trace, I see that nvprof believes everything is running on the system’s single TITAN Xp, as should be the case. The curious thing is that when I look at activity with nvidia-smi, I see that the process is using two GPUs: the Titan Xp and a GTX 1080 that is also on the system (devices 0 and 3, respectively). I should mention that the system actually has a total of 4 GPUs: one Titan Xp and 3 GTX 1080s.
As far as I know there is no selection of GPUs whatsoever in the code and this is corroborated by the nvprof trace. What do I make of the discrepancy I see with nvidia-smi’s output? Could this be related to the deadlocks/incorrect results?
Thanks for any clues!