Different results from printf and cuda-gdb

I’m probing register c[0], c[1], c[2], c[3] in a CUDA program.
The strange thing is that the results shown in CUDA-GDB is correct, but the results from printf is wrong. The program finally write back wrong results.
I attach my code here, you can run it with:
cuda.zip (6.0 KB)
nvcc -o mma_sp runner_mma_sp.cu -arch sm_80 -Xcompiler -fopenmp
Can anyone tell me what’s going on? Thanks in advance.

cuda-gdb version:12.6

When execute your code in compute-sanitizer, the results are correct. I assume there is a race condition and / or missing synchronization in your kernel code.
Indeed, when I compile with -lineinfo and run your code with compute-sanitizer --tool racecheck ./mma_sp , thousands of warnings are reported.

For example

compute-sanitizer --tool racecheck ./runner_mma_sp 
========= COMPUTE-SANITIZER
-1.295596 , -1.260879, 3.006134, 2.770409
========= Error: Race reported between Write access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x1d10 in kernels_mma_sp.cuh:174
=========     and Read access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x2050 in kernels_mma_sp.cuh:193 [2288 hazards]
========= 
========= Error: Race reported between Write access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x1710 in kernels_mma_sp.cuh:166
=========     and Read access at void mma_sp_m16n8k16_ptx_v1<(int)32, (int)32>(__half *, __half *, int, int, int, float *, unsigned short *)+0x2040 in kernels_mma_sp.cuh:188 [1280 hazards]

Note that with -lineinfo the report includes the source code lines which are responsible for the warnings.

Thank you so much. I missed a __syncthread() after cooperative fetching of shared memory.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.