System-wide atomic operation failure on multi-4090 systems

hidetoshimino · February 18, 2023, 4:28am

I would like to share my sample code here which exposes erroneous behaviors of multi-4090 systems.

https://github.com/mino-hidetoshi/System-wide_Atomic

As far as I tested, it produces non-deterministic results when running on multi-4090 system as follows:

Compilation:

$ nvcc -arch=sm_60 atomic-01.cu

Execution examples:

# single 4090 execution
$ CUDA_VISIBLE_DEVICES=0 ./a.out 
atomic-01.cu 
5000000

# dual 4090 execution
$ CUDA_VISIBLE_DEVICES=0,1 ./a.out 
atomic-01.cu 
5000269

I would like to know if this is common to all multi-4090 environments.
I have tested some instances on cloud GPUs and got this non-deterministic results WITH NO EXCEPTION. How about yours?

Multi-3090 systems ( or older ) always produce the right result of 5000000.

---- added on 2023/02/19

In the sample code, kernel updates managed variables repeatedly by atomicAdd_system(). If the atomic operation works correctly, the result should be 5000000. If the atomicity is broken, the results become unpredictable.

hidetoshimino · February 19, 2023, 3:39am

I have now found this multi-4090 issue is docker image dependent.

Three examples I tested follow:

nvidia/cuda_11.3.0-devel-ubuntu18.04 : No good
nvidia/cuda_11.3.0-devel-ubuntu20.04 : Good
nvidia/cuda_12.0.0-devel-ubuntu22.04 : No good

Ubuntu22 image is known to be problematic.
Older versions also seem to be unreliable.

In summary, this is not likely a GPU issue but a docker issue.

Sorry , If I bothered you.

Topic		Replies	Views
atomicAdd_system is not working on system with 4 4070 ti GPUs CUDA Programming and Performance cuda , kernel	7	770	February 15, 2023
Failure with independent devices on independent processes Try it yourself! CUDA Programming and Performance	19	3638	March 10, 2011
Get different results for every running with atomicAdd() CUDA Programming and Performance	2	414	October 3, 2022
atomicAdd not behaving as expected, atomicAdd_system not defined CUDA Programming and Performance	3	1634	September 5, 2022
AtomicAdd result incorrect CUDA Programming and Performance	3	1691	December 29, 2018
Mixing volatile writes and atomicAdd()s with shared memory produces unexpected results CUDA Programming and Performance	8	787	March 13, 2024
Standard nVidia CUDA tests fail with dual RTX 4090 Linux box Linux	54	22280	April 29, 2024
atomicAdd problems. CUDA Programming and Performance	3	2405	April 13, 2011
Code runs 3x times faster on X260 than on tesla c1060 CUDA Programming and Performance	21	6015	October 7, 2009
Reproducibility of atomic operations Legacy PGI Compilers	3	2185	October 29, 2019

System-wide atomic operation failure on multi-4090 systems

Related topics