Why different opeartions in thread affect atomicAdd()

Hi all, i am seeking help to understand this :
doRsltIdx get wrong when (pRegionInfo[blockIdx.x].pixelIndexE -pRegionInfo[blockIdx.x].pixelIndexB) = random of (0, 40000)

global void RegionCal(REGION_INFO* pRegionInfo, int* diRsltMaxNum,
int* doRsltNum, int* doRsltIdx)
{
int blockPixelNum = pRegionInfo[blockIdx.x].pixelIndexE -pRegionInfo[blockIdx.x].pixelIndexB + 1;

int threadPixelNum = (blockPixelNum - 1) / blockDim.x + 1;

for (int i = 0; i < threadPixelNum; ++i)
{
	if (atomicAdd(doRsltIdx, 1) < *diRsltMaxNum)
	{
		atomicAdd(doRsltNum, 1);
	}
}

}

void CalRegion(REGION_INFO* pRegionInfo, int* iRegionNum, int* diRsltMaxNum, int* doRsltNum, int* doRsltIdx)
{
rsltNum = 0;
dim3 blockSize(1024);
dim3 gridSize(iRegionNum);

RegionCal << <gridSize, blockSize >> > (pRegionInfo, diRsltMaxNum, doRsltNum, doRsltIdx);

}

please format your code properly. A possible set of instructions to do that are as follows:

  • edit your post by clicking the pencil icon at the bottom of your post
  • select the code
  • click the </> button at the top of the edit pane
  • save your edits

Its not really clear what the intent of your code should be, but this:

if (atomicAdd(doRsltIdx, 1) < *diRsltMaxNum)
{
	atomicAdd(doRsltNum, 1);
}

is unlikely to be correct, in my view. For example if the intent is to limit the value in the doRsltIdx location to no more than *diRsltMaxNum the code won’t do that. The key reason is that the atomic op requested in the if statement itself will always execute, whenever a thread reaches that point. As a result there is no limiting.

Thank you so much for your reply! My intention is to limit the value in the doRsltNum location to no more than *diRsltMaxNum. In my project, i may get a large number of calculation results in gpu, but i hope to deliver no more than 10000 resluts to cpu.

if (atomicAdd(doRsltIdx, 1) < *diRsltMaxNum)
{
	doRslt[atomicAdd(doRsltNum, 1)] = rslt; 
}

I am new to cuda, can you show me the proper way…

Please format your post properly.

1 Like

A simple solution would be to just atomicAdd without a check, and only perform assignment if the return value of atomic add does not exceeds the limit.

int position = atomicAdd(doRsltIdx, 1);
if(position < *diRsltMaxNum){
    doRslt[position] = rslt;
}

This will never write more than diRsltMaxNum elements. In the end, to find the total number of values written you need to clamp the value manually.

resultsize = min(*doRsltIdx, *diRsltMaxNum)