Parallel computing by cpu thread and gpu kernel

cheer37 · November 21, 2014, 2:23pm

Suppose that I have to execute the module 100 times with different data.
In this situation, which is fast below 3 cases, and why.

First, processing by a cuda kernel with 100 threads.
Second, processing by n cuda kernels with m threads in n cuda streams, n * m = 100
Third, processing by n cuda kernels with m threads in n cpu cores, n * m = 100

Thanks in advance.

jjsk2 · November 21, 2014, 3:19pm

only 100 threads? that is a very small count even for a single cuda kernel. GPU can execute hundreds of thousands of threads in parrallel. may be you need to revise your algorithm…

cheer37 · November 21, 2014, 3:22pm

I made a example for explaining the situation.
Ok, suppose that the number of threads are 1000000.
I want know the principle.

jjsk2 · November 21, 2014, 4:25pm

without any details on what you are trying to do multiple streams of kernels each working on the independent set of data will give the best performance.

cheer37 · November 21, 2014, 5:05pm

Would you tell me what the reason is?

Robert_Crovella · November 21, 2014, 5:53pm

For efficiently written code, it’s generally better from the kernel perspective, to launch a single kernel rather than n kernels. There are overheads associated with kernel launch, and there may also be inefficiences in separation of data.

However, if we consider data transfer as well, then to enable overlap of copy and compute, and assuming there is a problem operating on separable data where data transfer time is significant and can be hidden by overlap of copy and compute, then it is better to break it into several kernel launches, each operating on a portion of the data. “It is better” means it may have a faster execution time if there is significant overlap of copy and compute.

All of the above assume a large problem size. 100 threads is not useful or sensible from a CUDA perspective.

Topic		Replies	Views
Is it possible to run a cuda kernel on several cpu threads? and How it works? CUDA Programming and Performance	2	1780	December 8, 2014
concurrent kernels CUDA Programming and Performance	2	930	May 2, 2011
How to effectively parallelize cuda kernel launches on CPU CUDA Programming and Performance	9	3339	January 19, 2018
Overlapping kernel computing with stream per (CPU) thread, slow kernel launches CUDA Programming and Performance	10	3933	October 21, 2017
Is it recommended to throw multiple kernels at once? CUDA Programming and Performance cuda , kernel	5	2918	December 10, 2020
Kernel Functions Blocking Multithreaded Application? CUDA Programming and Performance	10	1290	August 15, 2021
multi task parallelization with cuda streams ? CUDA Programming and Performance	7	1623	September 14, 2017
reasons why splitting large kernel to smaller one lower perfromance CUDA Programming and Performance	4	3942	February 15, 2016
Will this improve performance? CUDA Programming and Performance	4	1131	February 6, 2015
Cannot force kernels to concurrent execution CUDA Programming and Performance	8	5711	April 28, 2012

Parallel computing by cpu thread and gpu kernel

Related topics