Segmentation Fault Segmenatation fault while removing atomic instructions.

Hi gys…

I am trying to remove the atomicMin instruction from the following code. I replaced it with gpuMin function but it always gives me segmentation fault error. Are the threads being deadlocked in any way?

__device__ bool flag = true;

inline __device__ void gpuMin( int *temp, int newLabel)

{

	while( flag !=true){}

	flag = false;

	*temp = min( *temp, newLabel);

	flag = true;

}

inline __device__ void union(int* buf, unsigned char *buf_uchar, unsigned char seg1, unsigned char seg2, int reg1, int reg2, int* changed)

{

	if(seg1 == seg2) 

	{			

		int newReg1 = findRoot(buf, reg1);		

		int newReg2 = findRoot(buf, reg2);	

	

		if(newReg1 > newReg2) {			

			//atomicMin(buf+newReg1, newReg2);		

			gpuMin(buf+newReg1,newReg2);

			buf_uchar[newReg1] = min( buf_uchar[newReg1], newReg2);		

			changed[0] = 1;			

		} else if(newReg2 > newReg1) {		

			//atomicMin(buf+newReg2, newReg1);	

			gpuMin(buf+newReg2,newReg2);

			buf_uchar[newReg2] = min( buf_uchar[newReg2], newReg2);

			changed[0] = 1;

		}			

	} 	

}

If there is anything wrong with my approach please point it out. Also what will be thee best way to remove atomicMin function…

Thanks…

Yes, the approach cannot work for multiple reasons:

    Device memory is not coherent between blocks (unless you use atomic operations).

    Multiple threads will enter the critical section at the same time.

    Any thread spinning on the “lock” will prevent other threads in the same warp from making progress.

    And probably others…

I wonder however: Why would you want to remove the atomicMin() in the first place? Any replacement that would even come close to working would be much more expensive.

You are right that it will be expensive. But it is just a fallback mechanism in case the PC doesn’t have a graphic card with compute capability > 1.1
I dont mind an expensive way of doing it but is there any way of doing it?

Unless [font=“Courier New”]buf[/font] is in shared memory, compute capability 1.1 will be sufficient. For 1.0 devices you might just fallback to the CPU.