nested parallelism

domket · January 15, 2009, 6:01am

I am trying to implement an algorithm that needs to perform approximately 10000 dot products to independent vectors of size of approximately 5000. As a program written for a CPU, this would be a nested loop where the inner loop computes the dot-product while the outer traverses the 10000 elements that need to have dot products performed.

I am new to CUDA and GPGPU, but wondering what would be the best way of parallelising this? From what I understand, it is possible to parallelize the actual dot-product operation and also carry out this on the 10000 elements simultaneously. But I am not sure if both can be done at the same time? Would be greatly appreciated if someone can put me onto a good example.

Sarnath · January 15, 2009, 8:50am

Each thread of your inner FOR loop could be considered as a thread.

If you run out of threads – then threads could run a FOR loop to cover the entire set of iterations…

Recently, I parallelized a code that had 4 FOR loops inside. We mapped 1 thread to the quadrapule <outerForloopIndex, middleLoopIndex1, MiddleLoopindex2, InnerLoopIndex>. It jus worked like breeze.

best Regards

Sarnath

Topic		Replies	Views
Parallelisim CUDA Programming and Performance	3	2653	August 24, 2007
Nested loops in CUDA Legacy PGI Compilers	13	9892	July 12, 2019
Newbie Question: Moving nested for loop CPU code to GPU. CUDA Programming and Performance	4	4548	December 2, 2015
Parallelizing for loops using CUDA CUDA Programming and Performance	3	2649	March 8, 2012
3D Block and Grid CUDA Programming and Performance	1	1844	April 25, 2012
nested Loops Best way to CUDA program Nested Loops CUDA Programming and Performance	1	5055	November 23, 2009
multi nested loop, any hints? CUDA Programming and Performance	1	704	June 17, 2012
parallel computation for 4 nested loops uisng Threads CUDA Programming and Performance	0	451	January 5, 2017
Nested loops nvc, nvc++ and nvfortran	7	749	August 6, 2022
Algorithm Strategy brainstroming.. Please help me choose the best algo.. CUDA Programming and Performance	5	1751	December 26, 2009

nested parallelism

Related topics