CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs

jwitsoe · September 5, 2014, 12:07am

Originally published at: https://developer.nvidia.com/blog/cuda-pro-tip-always-set-current-device-avoid-multithreading-bugs/

We often say that to reach high performance on GPUs you should expose as much parallelism in your code as possible, and we don’t mean just parallelism within one GPU, but also across multiple GPUs and CPUs. It’s common for high-performance software to parallelize across multiple GPUs by assigning one or more CPU threads to each GPU. In this post I’ll cover a…

anon28201799 · December 11, 2017, 5:24pm

I had suffered the bug, which always crashs in MPI while receiving message;
The most confuse thing is That, it not crash while the buffer to receive is not very large; This lead me to suspect the stability of MPI;

And more foolish, about a week before, I suffered another bug, which is an issue of open-mpi related to this more or less; https://github.com/open-mpi...
This issue was tread as a bug of open-mpi, and fixed in later version.

After costing days to find the bug, I really hope to saw this post earlier!

anon48113208 · March 21, 2018, 5:01pm

Is there any measurable performance impact by calling setCudaDevice unnecessarily? I would hope that if the current device is already 1 then calling setCudaDevice(1) would just exit right away without doing anything significant like talking to the GPU over the PCI bus. Is that how it actually works?

niilona · June 4, 2021, 7:19am

I’d also know about overhead of unnecessarily calling cudaSetDevice().
My problem is that don’t know actually, where the calls re-directed into device 0 as I only used NPP & nvJPEG -libraries.
Now I set a-lot of calls to set device cause don’t know necessarily pints those needed.
Tried with two Quadro 2x00 -series controllers, where that symptom popped out.

Justin_Luitjens · June 7, 2021, 1:06pm

I have never seen cudaSetDevice be a major limiter in perf. As far as I know the update is local to the thread only and requires no synchronization.

Topic		Replies	Views
Two general Multi-GPU questions. CUDA Programming and Performance	4	2781	January 24, 2012
about multi GPU control CUDA Programming and Performance	3	709	December 23, 2019
How many times does cudaSetDevice need to be called? CUDA Programming and Performance	4	2500	July 6, 2009
wrong results when using Cuda functions on multiple GPUs CUDA Programming and Performance	0	394	March 12, 2020
cudaSetDevice question CUDA Programming and Performance	12	33257	February 3, 2009
Multithread - determine GPU currently in use CUDA Programming and Performance	1	4842	January 28, 2012
compute-exclusive mode and cudaGetDevice(...) Always claims to be running on device 0. CUDA Programming and Performance	6	15903	July 27, 2009
multigpu CUDA Programming and Performance	2	3583	April 6, 2011
MPI + Peer2Peer combine MPI and Peer2Peer CUDA Programming and Performance	5	1813	February 8, 2012
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3463	March 10, 2011

CUDA Pro Tip: Always Set the Current Device to Avoid Multithreading Bugs

Related topics