Contexts: Performance question overhead by switching the context

beymar · February 5, 2009, 7:34pm

Hi,

I am writing a multithreaded application using the driver API where I have a pool of threads that are performing CUDA tasks. Each thread has its own context which does not change, so I do not use cuCtxPopCurrent.

I have data in page locked host memory and a thread performs the following operations:
cuMemcpyHtoD; Kernel; cuMemcpyDtoH;
Page locked host memory and device memory have been allocated within the threads context. To meassure the throughput the kernel is empty.

Now I want to process a set of data item of lets say 100 KB each. When I use one thread to process one item after another, performance is fine. When use 2 threads which are selected in round robin style, the performance breaks in by roughly 50 %. It does not decrease further if I use more threads.

Does someone know how much overhead a context switch causes on device. I thought the push/pop operations are costly on CPU side, but not on GPU side.

Thanks in advance,
Martin

tmurray · February 6, 2009, 2:38am

Context switching causes significant overhead on the GPU, as you’ve seen. cuCtxPush/PopCurrent are really the best ways to accomplish what you’re trying to do.

beymar · February 6, 2009, 8:15am

Thanks for your quick reply.

But then I have another question: If I use multiple GPUs and have one context and one host thread for each GPU, would that also cause an overhead if I switch between these contexts for each data item? Or does it apply only to a single GPU?

tmurray · February 6, 2009, 2:29pm

It applies to a single GPU only.

Topic		Replies	Views
Questions about multiple CPU threads on a single device Multiple context? CUDA Programming and Performance	1	3328	September 4, 2009
Mutilple contexts vs single context on 1 device CUDA Programming and Performance	3	743	November 28, 2012
Is it possible using muliple context for a GPU. mulitple CPU thread CUDA Programming and Performance	10	4847	April 8, 2009
What happens on a driver context switch CUDA Programming and Performance	7	4909	May 20, 2010
Using CUDA/CudaContexts simultanously from multiple CPU threads CUDA Programming and Performance	4	5436	February 3, 2010
Managing multiple GPUs from a single host thread CUDA Programming and Performance	1	1207	October 10, 2010
Multiple CUDA contexts per device in a single process CUDA Programming and Performance	2	4711	April 22, 2016
Single vs. Multiple contexts with multiple GPUs CUDA Programming and Performance	3	12576	December 28, 2010
Multiple CPU threads Performance hit CUDA Programming and Performance	5	5379	February 28, 2008
Per-process GPU memory overhead CUDA Programming and Performance	0	962	July 29, 2011

Contexts: Performance question overhead by switching the context

Related topics