Hi,
Actually, I haven’t had a bug in my code. Let’s define the problem I try to solve.
Given a float* of size widthheight (actually pitchheight).
I try so sort each column. So, I define one thread per column and each thread do a comb sort.
Here is my code
// Sort the column of matric tab.
__global__ void cuParallelCombSort2(float *tab, int width, int height, int pitch){
register int i1, i2;
float v1, v2;
int gap = height;
int swapped = 1;
unsigned int xIndex = blockIdx.x * blockDim.x + threadIdx.x;
if (xIndex<width){
while ((gap != 1) && swapped){
swapped = 0;
gap = gap * 10 / 13;
if (gap < 1)
gap = 1;
else if (gap == 9 || gap == 10)
gap = 11;
for (i1=0, i2=gap*pitch; i1 <= (height - 1 - gap)*pitch; i1+=pitch, i2+=pitch) {
v1 = *(tab + xIndex+i1);
v2 = *(tab + xIndex+i2);
if (v1 > v2) {
*(tab + xIndex+i1) = v2;
*(tab + xIndex+i2) = v1;
swapped = 1;
}
}
}
}
}
When I use profiler, I see that the computation time is never the same (I use a for loop to see several computation time). I init my array with the same value for each loop iteration. This is very weird because all of the rest of my code has a constant computation time…
Thaks for the help,
Vince