Programming issue How to convert for loops into threads

sharath · February 12, 2010, 5:49pm

This is a part of program for bubble sort technique. this program sorts number in ascending order. Some ‘for’ loops are tricky see the second ‘for’ loop here, how do u implement such thing in threads

for(j=1;j<n;j++)

{

for(i=0;i<n-j;i++) ----- How do we implement such things in threads

{

if(a[i]>=a[i+1])

{

   temp=a[i];

   a[i]=a[i+1];

   a[i+1]=temp;

}

}

Could u write this code in cuda. I am a beginner thats why i am practicing in little programs like these. Give me a good problem on cuda so that i could work on it and improve my cuda skills

Thanks for your time

YDD · February 12, 2010, 5:57pm

Many sorts don’t work will in parallel, due to the dependency issues you’re noting. The classic parallelisable sort is the merge sort. However, you don’t want to be doing that sort of thing in CUDA when you’re just learning. If this is for a bigger application, and you need a sort, look into thrust (my personal choice) or CUDPP. If you just want to learn CUDA in general, I’d start with BLAS-like routines - dot products, matrix-vector products and matrix-matrix products. Those tend to be easily parallelisable. Look through the SDK and Programming Guide. Although not uniform, some of the examples are very good.

jjp · February 13, 2010, 12:20pm

Just take a look at the examples in the CUDA SDK and read the manuals. The “standard” parallelization of a for loop in CUDA looks like this:

for (int i = threadIdx.x; i < n; i += blockDim.x) doSomething(i);

This obviously only works if there are no dependencies between the iterations of the loop. If you want a parallel bubble sort, make a query in your search engine of choice for “odd-even transposition sort”. But just like sequential bubble sort is a really bad sorting algorithm this one also is only a reasonable choice in few cases (the only situation I can imagine where it would be useful is sorting warp-sized arrays) :)

sharath · February 13, 2010, 2:57pm

Thanks u cleared most of my doubts :)

sharath · February 13, 2010, 3:01pm

Just take a look at the examples in the CUDA SDK and read the manuals. The “standard” parallelization of a for loop in CUDA looks like this:
for (int i = threadIdx.x; i < n; i += blockDim.x) doSomething(i);
This obviously only works if there are no dependencies between the iterations of the loop. If you want a parallel bubble sort, make a query in your search engine of choice for “odd-even transposition sort”. But just like sequential bubble sort is a really bad sorting algorithm this one also is only a reasonable choice in few cases (the only situation I can imagine where it would be useful is sorting warp-sized arrays) :)

okay thanks, I will look for other algorithms

Topic		Replies	Views
How do I sort using CUDA? CUDA Programming and Performance	2	4908	July 9, 2019
Newbie to CUDA - Help wanted Suggestions and help with implementing a parallel merge sort CUDA Programming and Performance	1	685	December 23, 2010
Converting a for loop to cuda CUDA Programming and Performance	2	2111	June 14, 2012
thread local 'for loop' question thread parallel for loop execution CUDA Programming and Performance	5	3388	August 29, 2007
Secuential Access to CUDA CUDA Programming and Performance	3	1382	July 2, 2009
parallel find find multiple items from a array CUDA Programming and Performance	4	4385	February 23, 2009
Can CUDA do sequential processing? CUDA Programming and Performance	7	6575	August 24, 2011
Parallelizing feedback loops CUDA Programming and Performance	7	5217	March 11, 2009
Bubble sort using many blocks? CUDA Programming and Performance	3	2065	July 11, 2014
CUDA parallelization fail..? CUDA Programming and Performance	3	3368	June 8, 2008

Programming issue How to convert for loops into threads

Related topics