Running CUDA kernels from two different pthreads

KapilMehta · May 9, 2016, 1:07pm

I am trying to run CUDA kernels inside two different pthreads what i read is if we don’t mention SM it goes into default SM0…

i have tried by running it on default SM and specific SM by mentioning SM by nvcc –default-stream per-thread flag as mentioned in link https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

But when i run it it doesn’t get executed at all…
Can somebody guide me how to run CUDA kernels into two different pthreads on two different SMs…

Any example would help.

Thanking in advance…

Robert_Crovella · May 9, 2016, 1:10pm

What do you mean by “two different SMs”. What do you mean by SM specifically?

In my understanding, SM means streaming multiprocessor, which is a CUDA GPU hardware building block (component).

CUDA programs (whether multi-threaded or not) don’t normally target specific SMs for execution, and using CUDA Streams also does not target specific SMs.

KapilMehta · May 9, 2016, 1:40pm

Thanks for your quick response…

Yes i do mean SM by streaming multiprocessors…

ok got your point… i misunderstood concept of streams and SMs but,

How many streams we can create ??
Do we have any control over SMs ?? how to optimally utilize SMs??

Robert_Crovella · May 9, 2016, 2:13pm

You can create a lot of streams. I don’t know how many, but there is no practical limit AFAIK. In most cases, it should not be necessary to create more than 3-4 streams per device, since streams can be reused.

You have very little control over scheduling of work onto SMs. For those who are trying to do unusual things, there are some tricks that can be played:

[url]https://devtalk.nvidia.com/default/topic/932044/cuda-programming-and-performance/how-to-limit-number-of-cuda-cores/[/url]

but these would not be ordinary CUDA programming techniques. In general, the machine handles scheduling of work onto SMs, and you have essentially no control over the detail scheduling.

BulatZiganshin · May 9, 2016, 3:17pm

kepler+ gpus can execute grids from up to 32 streams simultaneously, so it’s better to limit streams to that amount

ldaddr · May 9, 2016, 4:09pm

Can someone point to an example of multiple cpu processes interacting with multiple GPU threads over multiple streams?

Robert_Crovella · May 9, 2016, 4:35pm

multi-process cuda sample code:

[url]http://docs.nvidia.com/cuda/cuda-samples/index.html#simpleipc[/url]

multi-thread cuda sample code:

[url]http://docs.nvidia.com/cuda/cuda-samples/index.html#cudaopenmp[/url]

multi-stream cuda sample code:

[url]http://docs.nvidia.com/cuda/cuda-samples/index.html#simplestreams[/url]

KapilMehta · May 10, 2016, 8:04am

I am running two pthreads thread1 and thread2 in which thread1 does have 5 to 6 kernels and thread2 has only one kernel.

inside thread1 kernel launches i am not using streams and in thread2 i am using stream while launching kernel.

after start of application everything works fine for a while but after that no kernel is being launched and application stucks…

what could be preventing launch of kernel in this case ?

Topic		Replies	Views
Cuda multi stream schedule CUDA Programming and Performance	2	1571	October 11, 2023
CUDA thread and SM CUDA Programming and Performance	1	951	September 30, 2021
Concurrent Kernels CUDA Programming and Performance	3	4852	August 1, 2010
Multiple concurrent device processes using multiple concurrent host threads CUDA Programming and Performance	4	3770	January 26, 2009
Proper Use of Streams with Threads CUDA Programming and Performance	2	636	January 4, 2014
The speed of program run on multiple SMs is similar to the speed that run on single SM? CUDA Programming and Performance	1	404	September 25, 2021
code examples: using CPU threads can I see code for any apps using Pthreads on CPU? CUDA Programming and Performance	3	1143	June 9, 2010
Distribution Threads by the SMs CUDA Programming and Performance	1	569	December 15, 2014
Max 1 or 2 concurrent kernels per SM? CUDA Programming and Performance	19	11815	May 22, 2014
Easiest way to invoke two different kernels simultaneously ? CUDA Programming and Performance	4	5781	April 12, 2012

Running CUDA kernels from two different pthreads

Related topics