Non constant time over iterations

Hi,

Actually, I haven’t had a bug in my code. Let’s define the problem I try to solve.

Given a float* of size widthheight (actually pitchheight).

I try so sort each column. So, I define one thread per column and each thread do a comb sort.

Here is my code

// Sort the column of matric tab.

__global__ void cuParallelCombSort2(float *tab, int width, int height, int pitch){

    

    register int	i1, i2;

    float  	v1, v2;

    int    gap  = height;

    int    swapped	= 1;

    unsigned int	xIndex	= blockIdx.x * blockDim.x + threadIdx.x;

    

    if (xIndex<width){

        

        while ((gap != 1) && swapped){

            

            swapped = 0;

            

            gap = gap * 10 / 13;

            if (gap < 1)

                gap = 1;

            else if (gap == 9 || gap == 10)

                gap = 11;

           for (i1=0, i2=gap*pitch; i1 <= (height - 1 - gap)*pitch; i1+=pitch, i2+=pitch) {

                v1 = *(tab + xIndex+i1);

                v2 = *(tab + xIndex+i2);                

                if (v1 > v2) {

                    *(tab + xIndex+i1)	= v2;

                    *(tab + xIndex+i2)	= v1;

                    swapped    = 1;

                }

            }

        }

    }

}

When I use profiler, I see that the computation time is never the same (I use a for loop to see several computation time). I init my array with the same value for each loop iteration. This is very weird because all of the rest of my code has a constant computation time…

Thaks for the help,

Vince