Multiple Devices in CUDA - Crash

eDave · July 16, 2009, 4:05pm

I am working on some code that is designed to run on multiple GPU devices… we have a computer with 4 cards on the way.
I am testing the code on a machine with 3 Tesla cards.

My code uses POSIX threads to launch four (or currently, three) CPU threads, each which launches its own GPU kernels.
(so each one uses cudaSetDevice(threadid) where threadid can take the values 0-3, or 0-2 currently)
It runs successfully for 3 or 4 iterations, and then it halts, printing out that there is a CUDA error, usually “no CUDA-enabled device available,” but occasionally “invalid device symbol.”

I alternate between running two different kernels on each card.
I just don’t understand why it would run without a hitch several times, and then stop with such a cryptic error, or what I should do to remedy it. I would appreciate any help on this greatly!

gshi · July 16, 2009, 4:24pm

I had the same problem with the error “invalid device symbol” in memcpy for constant memory when dealing with multiple GPUs using multiple threads.

In main thread, I uses GPU 0 to do some computation, release GPU 0 (cudaThreadExit() )then launches n pthreads, where n is the number of GPUs available in the system. Each child threads run on one GPU.

The code always works in cuda 2.0
With cuda 2.2, the code sometimes works, sometimes fails with “invalid device symbol” error.
With cuda 2.3 beta, same as 2)

Later I moved the GPU computation in the main thread to child thread 0. Since then, the code works fine in cuda 2.2.

Hope it helps

-gshi

I am working on some code that is designed to run on multiple GPU devices… we have a computer with 4 cards on the way.

I am testing the code on a machine with 3 Tesla cards.

My code uses POSIX threads to launch four (or currently, three) CPU threads, each which launches its own GPU kernels.

(so each one uses cudaSetDevice(threadid) where threadid can take the values 0-3, or 0-2 currently)

It runs successfully for 3 or 4 iterations, and then it halts, printing out that there is a CUDA error, usually “no CUDA-enabled device available,” but occasionally “invalid device symbol.”

I alternate between running two different kernels on each card.

I just don’t understand why it would run without a hitch several times, and then stop with such a cryptic error, or what I should do to remedy it. I would appreciate any help on this greatly!

jack · July 16, 2009, 8:09pm

Search the forums for ‘GpuWorker’, it’s a bit of code that another user (MisterAndersen42) wrote that is designed to efficient handle multiple GPU’s. Perhaps you can use that in your project and save yourself some time and trouble.

Topic		Replies	Views
Problematic multi GPU execution CUDA Programming and Performance	6	1978	June 12, 2012
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3459	March 10, 2011
CUDA - multiple devices Using multiple gpus CUDA Programming and Performance	2	4497	March 12, 2010
Single Device, Multithreaded host, cuda error: unspecified launch failure CUDA Programming and Performance	0	702	January 2, 2014
Very strange problem. Different behavior on different device numbers. CUDA Programming and Performance	2	817	May 14, 2013
multi-GPU in cuda 4 CUDA Programming and Performance	4	994	September 9, 2011
CPU threads and CUDA CUDA Programming and Performance	8	7140	January 15, 2018
Different performance from different GPUs with Identical Code CUDA Programming and Performance	18	4360	April 11, 2012
device emulation of multiple-GPUs the device emulation mode does not work properly CUDA Programming and Performance	0	4667	March 13, 2010
Program received signal CUDA_EXCEPTION_10, Device Illegal Address. CUDA-GDB	5	3242	March 3, 2017

Multiple Devices in CUDA - Crash

Related topics