The atomic functions do not provide correct results

chaowei · March 26, 2021, 4:49pm

I tried to test Atomic function with this routine: testKernel
The project file is simpleAtomicIntrinsics_vs2019.vcxproj from NVIDIA Corporation\CUDA Samples\v11.0\0_Simple\simpleAtomicIntrinsics.

I found that the results are not correct based on the definition of atomic functions. Here are the results in visual studio at the beginning and ending routine.

Robert_Crovella · March 26, 2021, 5:03pm

The application computes whether the results are correct in the computeGold routine. So I would assume that if computeGold is not reporting an error, things are working correctly.

My guess would be here that you have set breakpoints or otherwise used the visual studio debugging interface in such a way that the results you are looking at are only after a single warp has executed.

I would also encourage you where possible to not post text as images but post it as formatted text, when asking for help here.

chaowei · March 26, 2021, 5:16pm

Hi Robert_Crovella, Thank for your reply.
My testing routine is not “ComputeGold” routine. it is testKernel.
From the definition of atomic Add function, my expected result of the code : atomicAdd(&g_odata[0],10) will be 10 because the old data of g_odata is 0.
my expected result of the code : atomicSub(&g_odata[1],10) will be -10 because the old data of g_odata[1] is 0.
That is the problem.

Robert_Crovella · March 26, 2021, 5:25pm

Perhaps you should study the whole sample code, to understand how it works, rather than just the kernel.

That would be true if only one thread were running. But you have multiple threads running in parallel, and in particular you have threads in a warp executing in lockstep. What you are observing is the result after 32 threads have completed the work, specifically the first warp. You’ll need to understand how a GPU executes code. The debugger does not isolate a single thread for you. When you allow a thread to execute this line:

atomicAdd(&g_odata[0],10);

at a minimum, it will not be one single thread executing that line of code, it will be all the active threads in the warp.

If you would like to see the behavior of just a single thread, in isolation, one way to do that would be to modify that kernel launch, so that only one thread is executing.

chaowei · March 26, 2021, 7:30pm

I saw the whole project and found the routine : computeGold to check the results from the routine: testKernel. There is no error report from computeGold, which means the results from testKernel are correct.
I checked the variable in Locals window and see what happen after calling atomicAdd.

-	[Launch Details]	{…}
	@flatBlockIdx	0	uint64_t
	@flatThreadIdx	0	ulong
+	blockIdx	{ x=0 y=0 z=0 }	uint3
+	threadIdx	{ x=0 y=0 z=0 }	uint3
+	gridDim	{ x=64 y=1 z=1 }	dim3
+	blockDim	{ x=256 y=1 z=1 }	dim3

I found that the information of the thread was not changed, but the value of g_odata[0] was changed to 320. If the displaying information is not expected, it will be difficult to debug the codes.
Your explanation about executing the line code: atomicAdd( &g_odata[0],10) is very good.
The result after is done all active threads in the warp.
Thank you very much for your good explanation.

Topic		Replies	Views
Atomic operation problem CUDA Programming and Performance	2	867	June 2, 2008
Get different results for every running with atomicAdd() CUDA Programming and Performance	2	375	October 3, 2022
Why does a kernel which contains atomic functions return correct result unless I insert a printf() to check it? CUDA NVCC Compiler cuda , kernel , windows-driver-solutions	0	504	March 3, 2023
atomicAdd() during loop not work well but at end work well CUDA Programming and Performance	3	1193	May 20, 2010
AtomicAdd() functions CUDA Programming and Performance	1	759	December 9, 2016
Consecutive atomic function used in program CUDA Programming and Performance cuda	2	396	March 2, 2022
atomicAdd not behaving as expected, atomicAdd_system not defined CUDA Programming and Performance	3	1498	September 5, 2022
AtomicAdd result incorrect CUDA Programming and Performance	3	1629	December 29, 2018
incorrect results from atomicAdd (maybe the method is incorrect) CUDA Programming and Performance	1	3769	May 2, 2010
Use of Atomic Functions Still Leads to Race CUDA Programming and Performance	0	338	February 14, 2019

The atomic functions do not provide correct results

Related topics