Questions about multiple CPU threads on a single device Multiple context?

ppbryant · August 20, 2009, 5:37am

Hello all,

I’m a newbie CUDA user.
I tried to rewrite some functions in my program to speed it up.
At first, I used the runtime API to do this, and the modified program worked well with a single CPU thread.
Since the program is designed to run with multiple CPU threads, I tried to use the driver API to share the data on devices (by pushing, poping context).

Currently, I implemented it in the following way:

One thread “A” creates n CUDA contexts and allocates memory for each context.
All n CPU threads are invoked simultaneously. Each CPU thread pushes the corresponding context and starts working (invoking kernels and transferring data between host and device)
Repeat step 2. until all data are processed. Then, thread “A” releases the memory and destroys the contexts created in step 1.

My questions are:

Q1: I saw some suggestions saying that using multiple contexts in a single GPU is not good. But, is it still acceptable to create n CUDA contexts when n CPU threads work simultaneously and one context cannot be shared by multiple CPU threads at the same time?

Q2: The aforementioned method seems to work well if n is not too big.
But on my graphic card (9800GT with 512MB) the memory ran out if n > 13.
I did a simple experiment using the API cuMemGetInfo, and the free memory decreased 40-50 MB each time when I created one context (without allocating any other memory space).
So even the memory I allocated is less than 1MB per context, the memory still runs out. Would anyone please give me some instructions/suggestions to solve this problem?

Very sorry if my descriptions are not clear due to my poor English, and thank you very much for reading and replying.

ppbryant · September 4, 2009, 7:11am

Would anyone please give some suggestions or experiences?

I did the experiment on memory cost on another PC, and the free memory decreased about 33MB each time when I created one context (without allocating any other memory space). I want to know what’s the cause of the memory cost per context, and if there is any method that can decrease the cost.

Moreover, I have ever seen that there is a limit on the number of contexts per card (e.g. 16 in Windows). Is it still the case currently? Is it the best manipulation to modify the program to decrease the number of CPU threads to avoid creating so many contexts?

Thank you very much for reading and replying.

Topic		Replies	Views
Is it possible using muliple context for a GPU. mulitple CPU thread CUDA Programming and Performance	10	4998	April 8, 2009
CUDA driver API - multiple threads with the same CuContext CUDA Programming and Performance cuda	7	2684	October 28, 2022
questions memory allocation and CUDA contexts CUDA Programming and Performance	7	11396	February 4, 2008
Limit to the number of contexts created? CUDA Programming and Performance	2	3395	July 17, 2009
Using CUDA/CudaContexts simultanously from multiple CPU threads CUDA Programming and Performance	4	5576	February 3, 2010
Mutilple contexts vs single context on 1 device CUDA Programming and Performance	3	812	November 28, 2012
How to share GPU memory from different host threads? CUDA Programming and Performance	6	2422	July 14, 2010
CUDA,Context and Threading CUDA Programming and Performance	6	19843	May 29, 2012
Support for multi-threaded apps on cuda and multiple applications on cuda CUDA Programming and Performance	13	12872	January 24, 2011
Multi-GPU with a single thread and driver API? CUDA Programming and Performance	5	5075	July 25, 2008

Questions about multiple CPU threads on a single device Multiple context?

Related topics