Programming issue How to convert for loops into threads

This is a part of program for bubble sort technique. this program sorts number in ascending order. Some ‘for’ loops are tricky see the second ‘for’ loop here, how do u implement such thing in threads



for(i=0;i<n-j;i++) ----- How do we implement such things in threads










Could u write this code in cuda. I am a beginner thats why i am practicing in little programs like these. Give me a good problem on cuda so that i could work on it and improve my cuda skills

Thanks for your time

Many sorts don’t work will in parallel, due to the dependency issues you’re noting. The classic parallelisable sort is the merge sort. However, you don’t want to be doing that sort of thing in CUDA when you’re just learning. If this is for a bigger application, and you need a sort, look into thrust (my personal choice) or CUDPP. If you just want to learn CUDA in general, I’d start with BLAS-like routines - dot products, matrix-vector products and matrix-matrix products. Those tend to be easily parallelisable. Look through the SDK and Programming Guide. Although not uniform, some of the examples are very good.

Just take a look at the examples in the CUDA SDK and read the manuals. The “standard” parallelization of a for loop in CUDA looks like this:

for (int i = threadIdx.x; i < n; i += blockDim.x) doSomething(i);

This obviously only works if there are no dependencies between the iterations of the loop. If you want a parallel bubble sort, make a query in your search engine of choice for “odd-even transposition sort”. But just like sequential bubble sort is a really bad sorting algorithm this one also is only a reasonable choice in few cases (the only situation I can imagine where it would be useful is sorting warp-sized arrays) :)

Thanks u cleared most of my doubts :)

okay thanks, I will look for other algorithms