Recursion in CUDA Hardware supported

Hi guys, I’m trying to parallel a linear version of Quicksort.

the program should be running in a recursive manner

conversion is almost done except that i have a message saying it doesn’t support recursion yet although i’ve made a research and says that my hardware is genuinely supporting recursion for cuda

im using GTX560 TI to compile the project

and im getting this error message :-

1>c:/Users/All Users/NVIDIA Corporation/NVIDIA GPU Computing SDK 3.2/C/src/Parallel Quicksort/Parallel Quicksort.cu(136): Error: Recursive function call is not supported yet: quicksortDevice(int*, int, int)
1>c:/Users/All Users/NVIDIA Corporation/NVIDIA GPU Computing SDK 3.2/C/src/Parallel Quicksort/Parallel Quicksort.cu(135): Error: Recursive function call is not supported yet: quicksortDevice(int*, int, int)

I’m meeting a dead end on how to do the recursion thingy . .

Here’s a code snippet for the recursion

device void quicksortDevice(int *list, int m, int n)
{
… some code…
quicksortDevice(list, m, j-1);
quicksortDevice(list, j+1, n);
}

Help me please . Thank you :P

Compile with [font=“Courier New”]nvcc -arch sm_21[/font] to generate code for your recursion-capable GPU.

how can i achieve that ?
mind elaborate a bit please ?

im still new to cuda programming and i’m using the template provided with CUDA SDK .

running VS2008 64bit + win7 ultimate 64bit .

thanks in advance .

Sorry, I have no idea about Visual Studio. You’ll find the option somewhere in the project settings.

There have been similar questions in this forum, maybe you can find the answer with a forum search.

It’s alright.
Problem solved .
thanks to your guide as well
I’m able to find the build rule and include SM2.1 into the compiler

anyway, does anyone happen to have any source of sorting algorithm using CUDA ?
like all those quicksort, bubblesort, mergesort and stuffs .

I’ve not yet needed to sort anything on the GPU. Have you looked at the radix sort and sorting networks examples in the SDK?

I did indeed
seems like the code they provided a bit too high level compared to my knowledge level .

my major concern is the synchronization between threads during parallel execution since sorting would require data integrity within the memory .

Hi, I am having your same problem…

I am on VS10 and I found in Project Properties → Cuda C/C++ → Code generation

Originally it was compute_10,sm_10

I tried many combination, include the most logical

compute_20,sm_20

…which is the right parameters?

Both. Basically select whether you are looking for 1.0, 1.1, 1.2, 1.3, 2.0, 2.1, etc. For a thorough explanation, I suggest reading Chapter 3.1 in the CUDA C Programming Guide (I’m currently looking at version 3.1).

not really sure bout that, but im guessing, as long as ur GPU supports up to whichever version, just put it to that one .
like mine, it supports 2.1 so i use sm 2.1 for compilation .

There are non recursive implementations of quicksort,

which are faster (on CPUs at least)

http://www.seeingwithc.org/topic2html.html

quicksort is a sorting algorithm.

It looks into the sorted values and

on basis of that it decides whether

there is still something to sort.

Therefore it is very efficient.

This cannot be made parallel.

Or it can, with a very complicated

messaging system, but then the

most frequent message among

the threads will be:

hey others, wait for me finishing,

do not sort, before I’am ready.

The peformance will be lausy,

just a liitle bit faster than a 1 CPU-Core

with Quicksort.

In the SDK there are code samples

of sorting networks.

There perfect for parallel processing,

hence they are not efficient as quicksort,

just somewhat faster than a 1 CPU-Core

with quicksort, factor 2,or 3 or 4.

However much luck.