I’m probing register c[0], c[1], c[2], c[3] in a CUDA program.
The strange thing is that the results shown in CUDA-GDB is correct, but the results from printf is wrong. The program finally write back wrong results.
I attach my code here, you can run it with: cuda.zip (6.0 KB) nvcc -o mma_sp runner_mma_sp.cu -arch sm_80 -Xcompiler -fopenmp
Can anyone tell me what’s going on? Thanks in advance.
When execute your code in compute-sanitizer, the results are correct. I assume there is a race condition and / or missing synchronization in your kernel code.
Indeed, when I compile with -lineinfo and run your code with compute-sanitizer --tool racecheck ./mma_sp , thousands of warnings are reported.
For example
compute-sanitizer --tool racecheck ./runner_mma_sp
========= COMPUTE-SANITIZER
-1.295596 , -1.260879, 3.006134, 2.770409
========= Error: Race reported between Write access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x1d10 in kernels_mma_sp.cuh:174
========= and Read access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x2050 in kernels_mma_sp.cuh:193 [2288 hazards]
=========
========= Error: Race reported between Write access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x1710 in kernels_mma_sp.cuh:166
========= and Read access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x2040 in kernels_mma_sp.cuh:188 [1280 hazards]
Note that with -lineinfo the report includes the source code lines which are responsible for the warnings.