Dynamic Parallelism improvement

Guix · February 11, 2013, 3:24pm

Hi everyone,

In order to determine in which compute cases the Dynamic Parallelism is interesting I did some tests on my Tesla K20 :

Starting from the NVIDIA Sample "cdpLUDecomposition" I implemented a "classical" version (without DP) of the decomposition (just switch the parent kernel by a C function and add some memory transfers). My implementation is relatively ugly but it works, and curiously it works faster than the original sample (142 GFLOPS vs 82 GFLOPS on a 8192*8912 matrix) ! Ok, may be the Sample "cdpLUDecomposition" is just a demonstration of how to use DP but not of its potential.
I tried DP on an homemade application which processes hundreds matrix. The process is the same and is independent for each matrix. So I did a main (parent) kernel where each mono-threaded block processed one matrix and launched child kernels. I did an other version where the main (parent) kernel had only one mono-threaded block and where the matrix processing was dispatched on one dimension of the child kernels. In the both cases, the "classical" version of my application is faster…

I am not defeatist, I would like to know if someone has some example where the Dynamic Parallelism brings an real gain ?
What the characteristics of an adapted use case ?

Thanks in advance,
Guix

JFSebastian · February 15, 2013, 8:40am

Please, take a look at the discussion

[url]parallel processing - Dynamic programming in CUDA: global memory allocations to exchange data with child kernels - Stack Overflow

For an interpolation problem, I have finally improved my results using dynamic programming.

Guix · February 15, 2013, 5:02pm

Hi, thanks for your answer, I will take a look !

Guix

Topic		Replies	Views
dynamic parallelism CUDA Programming and Performance	3	1164	December 30, 2012
Is this strategy not suitable for dynamic parallelism ? CUDA Programming and Performance	0	522	January 9, 2014
A question on nested parallelism CUDA Programming and Performance	5	1466	April 11, 2019
Is dynamic parallelism suitable for this application? CUDA Programming and Performance	3	1257	August 20, 2013
a question about low performance on dynamic parallelism with tremendous data CUDA Programming and Performance	2	1232	May 27, 2013
How to obtain the best performance of Dynamic Parallelism CUDA Programming and Performance	1	574	April 23, 2015
dynamic parallelism performance CUDA Programming and Performance	4	1038	January 3, 2013
How does CUDA dynamic parallelism reduce CPU-GPU communication? CUDA Programming and Performance	1	590	August 29, 2017
How much benefit can i get from dynamic parallelism in my code CUDA Programming and Performance	0	689	December 24, 2013
Bottlneck of dynamic programming in CUDA: global memory allocations to exchange data with child kern CUDA Programming and Performance	1	1031	February 15, 2013

Dynamic Parallelism improvement

Related topics