Wrong behavior when moving from emudebug mode to debug Algo not working in debug mode but working in

Hi,

I have created a simple shell-sort algo that I tested in emudebug mode. Everything is working well and the results are correct.

But when I test it on debug mode, the results are wong: the table is not sorted at all!

I don’t understand such a behavior.

Here is my kernel call

sort<<<BLCKNB, BLCKTH>>>(device_result, SIZE);

and my kernel code

__device__ __constant__ int shell[16] = {1391376, 463792, 198768, 86961, 33936, 13776, 4592,1968, 861, 336, 112, 48, 21, 7, 3, 1};

__global__ void sort (int *table, int size)

{

	int start;   	

	int i, j, val;	

	

	int idx = blockIdx.x * blockDim.x + threadIdx.x; 		

	if(idx < 16)

	{

		start = shell[idx];

		for (i = start; i < size; i++)

		{

			val = table[i];

			j = i;

			while (j >= start && table[j - start] > val)

			{

				table[j] = table[j - start];

				j = j - start;

			}

			table[j] = val;

		}	

	}		

}

The "wrong results are simple: the table after sorting contains several times the same numbers!

Thanks for your help!

Things working in EMUdebug and not in debug is a sign that the code, as it is, isnt fully parallelised.

In EMUdebug only 1 thread runs at a time and the threads are run in order. So thread 0 will start and run until it reaches a __synchthread() or some other point where it has to stop, then thread 1 will run again to the same point, then thread 2, …

In Debug mode threads run in parrallel
So you are actually moving ( idx < 16) values simultaneously, at some point 2 or more threads are moving the same value and the result is it is duplicated (the different threads have different strides)

I think you need a substantially different algoritm for sorting in parrallel.

Things working in EMUdebug and not in debug is a sign that the code, as it is, isnt fully parallelised.

In EMUdebug only 1 thread runs at a time and the threads are run in order. So thread 0 will start and run until it reaches a __synchthread() or some other point where it has to stop, then thread 1 will run again to the same point, then thread 2, …

In Debug mode threads run in parrallel
So you are actually moving ( idx < 16) values simultaneously, at some point 2 or more threads are moving the same value and the result is it is duplicated (the different threads have different strides)

I think you need a substantially different algoritm for sorting in parrallel.