Running multiple processes on a GPU cause it stuck

wangx0800 · February 2, 2010, 5:16pm

I am running on Linux with CUDA 2.3 on a Tesla/C1060 card. I have two different kernels (say K1, K2) each runs within a separate Linux process (P1, P2). The CPU/GPU interface is pretty straightforward – cudaSetDevice, memcpy from cpu to gpu, kernel call, cudaThreadSynchronize, memcpy from gpu to cpu…

On a single GPU, I run P1 or P2 or P1+P2 in tight loops for days/weeks and everthing is fine. However, as soon as I add another instance of P2 into the picture, P1+P2+P2, either with all three running in tight loops, or the 2nd P2 as a periodical probe, things begin to fell apart. I would get sporadic " unspecified launch failure" (mostly from P2/K2), and eventually one of the processes would stuck in either “R” or “D” states and no valuation can be processed anymore. Basically all processes on the GPU stuck and the only way to recover the GPU is a reboot. EIPs from stuck processes suggest it either stuck in ioctl() or cudbgIpcCall(). CUDA 2.2 had the same problem.

What could be the possible cause?

seibert · February 2, 2010, 8:57pm

tmurray has mentioned in previous threads that multiple processes sharing one CUDA device can potentially hit race conditions or deadlocks due to driver bugs. This is supposedly improved in later CUDA releases, though. Can you test the CUDA 3.0 beta?

tmurray · February 2, 2010, 9:21pm

Even beyond bugs, there’s no performance reason to do what you’re trying to do. Context switching between different GPU contexts is expensive.

wangx0800 · February 2, 2010, 9:59pm

Understood. However, the setup is due to other factors besides performance consideration.

My driver is /usr/lib64/libcuda.so.190.18. So this version does have some bugs in this scenario? Was I just lucky when I had two processes (P1+P2) running without any problem? Does this probelm have anything to do with how much GPU resources (global memory, shared memory…) these kernels use? Thanks!

tmurray · February 2, 2010, 10:33pm

190.18 has plenty of known issues at this point; please try with the latest 195/196.xx Linux driver (I can’t keep track of it).

jarjar · March 16, 2010, 7:29pm

How is context switching between different GPU contexts handled, if I launch two process each computing certain function on the GPU ?

When is the context switching carried out and how expensive is this operation ?

Could you explain what happens under the following different condition ?

(1) Kernel context switching between different GPU contexts

(2) Kernel context switching within one GPU contexts

Topic		Replies	Views
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12049	February 23, 2009
Problem with multiple GPUs The multiple GPUs are not working in parallel CUDA Programming and Performance	6	1883	September 2, 2010
Invoking kernel from multiple PC processes CUDA Programming and Performance	1	5503	June 3, 2011
Contexts: Performance question overhead by switching the context CUDA Programming and Performance	3	2799	February 6, 2009
Single vs. Multiple contexts with multiple GPUs CUDA Programming and Performance	3	12599	December 28, 2010
Why 2 parallel processes slower than 1 + 1 CUDA Programming and Performance	2	743	July 2, 2013
Multiple GPU very slow performance CUDA Programming and Performance	7	1314	November 10, 2022
Question about interoperability of CUDA Graphs Green Context across multiple processes CUDA Programming and Performance cuda	4	67	May 14, 2025
GPU Context switch of multiple processes CUDA Programming and Performance	8	4087	February 24, 2021
CUDA 2.3 problems with multiple GPUs (using more than one for a single process) CUDA Programming and Performance	3	2066	October 26, 2009

Running multiple processes on a GPU cause it stuck

Related topics