Hi, I’d like to do the binarize operation on a 8bit mono image.
and I hope to minimize the size of ouput data which has to be transfer from device to host.
So, I hope the ouput data can be 1 bit data casted as unsigned integer(which can be operated by atomic funcitons).
To use atomic function can prevent missing of multi-operation caused by multi-threads doing on the same 32bit data.
My code is as below, but it seems to be very slow.(about 200ms on GTX285)
And after my checking, the bottleneck is the atomic operation.
Is there any method to make it and prevent from multi-operation on the same pixel value?
__global__ void Binarize(unsigned int* Source,unsigned int* Dest,int SizeX,int SizeY,unsigned char th)
{
unsigned int idxX = IMUL(blockIdx.x , blockDim.x) + threadIdx.x,
idxY = IMUL(blockIdx.y , blockDim.y) + threadIdx.y,
index_in = IMUL(idxY,SizeX) + idxX,
nSet = index_in/32,
nBit = index_in%32,
OneVal=1,
ZeroVal=((OneVal<<nBit) ^ 0xFFFF);
if (idxX<SizeX && idxY<SizeY){
if (*(Source+index_in)>=th){
atomicOr((Dest+nSet),OneVal<<nBit);
}
else
atomicAnd((Dest+nSet),ZeroVal);
}
}
}