Quicksort problem


I have a matrix (float*) of size width*height.

I’d like to sort each column independently.

So I’d like to use quicksort, but a modified version.

I use a 1D grid where each thread correspons to one column.

Each thread use the modified quicksort.

The quicksort algorithm use recursive function.

I paste the code bellow.

The problem is that I cannot compile this code.

Thanks for your help =)


__device__ int partition(float* a, int w, int l, int r) {

    float pivot, t;

    int i, j;

    pivot = a[l*w];

    i = l; j = r+1;




        do ++i; while( a[i*w] <= pivot && i <= r );

        do --j; while( a[j*w] > pivot );

        if( i >= j ) break;

        t = a[i*w]; a[i*w] = a[j*w]; a[j*w] = t;


    t = a[l*w]; a[l*w] = a[j*w]; a[j*w] = t;

    return j;


__device__ void quickSort(float* a, int w, int l, int r)


    int j;


    if( l < r )


        // divide and conquer

        j = partition( a, w, l, r);

        quickSort( a, w, l, j-1);

        quickSort( a, w, j+1, r);




// Sort the column of matric tab.

__global__ void cuParallelQuickSort(float *tab, int width, int height, int pitch){

    unsigned int xIndex = blockIdx.x * BLOCK_DIM + threadIdx.x;

    if (xIndex<width){

        quickSort(&tab[xIndex], width, 0, height-1);



I think i can remember that device functions does not support recursions.


Yes, CUDA does not support recursion (all functions are inlined).

There is a bitonic sort example in the SDK.

I am curious though. In CUDA programming manual it is mentioned that it is possible, though not always, to force a function not to be inlined via noinline construct (or something like that). Would it be possible the have recursive call of this function. I suppose not, as this is described in CUDA manual; nevertheless, I am curious of limitations which do not allows this. Stack depth? …