CPU multithread with several devices

jeannot · May 12, 2011, 9:36am

Hello,
i have several CPU threads that need to execute a CUDA kernel, and i have several CUDA devices.
I’d like to know how to manage this, what are the tools provided by CUDA, how to choose on which device each thread will execute his kernel, if there there is a way to know if a device is currently used etc…
thanks
jean

avidday · May 12, 2011, 1:57pm

CUDA doesn’t provide anything specific to handle multithreading. The traditional model is that each thread explicitly creates a context on its given device, subsequent CUDA operations then require no changes to run within that context.

jeannot · May 12, 2011, 9:36pm

hi, thank you for your answer. so i guess i’ll have to make some kind of “routing” class to share the threads between the devices.

do you know if threre is a way to monitor the device activity to be able to assign the next free device to the next idling thread?

jean

avidday · May 13, 2011, 8:48am

Prior to CUDA 4.0 it isn’t really possible to do what you are thinking. The host thread-context-device affinity needs to be static, so you establish it at the beginning of the execution and keep it until the end. This was the main source of pain in using OpenMP with CUDA for multi-gpu: most openMP runtimes keep a pool of threads and just pick the next free host thread to perform an action. There was not any guarantee that the physical host thread (and hence device context) would be constant throughout the life of an application. Ideally you want to have “consumer” threads holding a statically assigned GPU, then have one or more “producer” threads generating work for those threads. There is also a context migration API in CUDA, which allows a context to be moved migrated from one thread to another.

In CUDA 4.0, the approach can be different, because it is now possible for a single host thread to establish multiple contexts and work with multiple devices directly. Although I haven’t really played with threaded code in CUDA 4.0 yet, the approach there might be to have a parent thread establish contexts one each GPU, then pass devices to threads as required. But I haven’t yet started migrating any threaded multi-gpu code to CUDA 4.0, so I can’t really offer any specific advice on that.

HenrikAndresen · May 13, 2011, 9:16am

Hi Jeannot

It would be worth looking into the context management part of the cuda drive API. This allows you to have several threads work on the same or different devices.

From personal experience I have found it easier to use worker threads that maintain their individual context to CUDA, and then divide the jobs to these threads and combine them in the end. This all depends on your problem of course, but I had to switch context so often that it was more trouble than it was worth.

Cheers

Henrik

Topic		Replies	Views
CUDA processor allocation CUDA Programming and Performance	7	3437	October 5, 2007
CUDA 4.0 Context Sharing by Threads Impact on existing Multi-threaded Apps CUDA Programming and Performance	8	22911	March 9, 2011
Documentation/example on new (non-threaded) method for multiple devices? CUDA Programming and Performance	2	544	April 17, 2016
Using Multiple Devices CUDA Programming and Performance	3	3461	September 10, 2008
CUDA,Context and Threading CUDA Programming and Performance	6	19496	May 29, 2012
CPU-GPU question CUDA Programming and Performance	6	814	June 2, 2011
Support for multi-threaded apps on cuda and multiple applications on cuda CUDA Programming and Performance	13	12734	January 24, 2011
Using multi devices sumultanous CUDA Programming and Performance	9	6487	June 16, 2011
Multi-GPU with a single thread and driver API? CUDA Programming and Performance	5	4987	July 25, 2008
Multiple GPUs, multiple applications CUDA Programming and Performance	10	10012	April 22, 2009

CPU multithread with several devices

Related topics