I’m probing register c[0], c[1], c[2], c[3] in a CUDA program.
The strange thing is that the results shown in CUDA-GDB is correct, but the results from printf is wrong. The program finally write back wrong results.
I attach my code here, you can run it with: cuda.zip (6.0 KB) nvcc -o mma_sp runner_mma_sp.cu -arch sm_80 -Xcompiler -fopenmp
Can anyone tell me what’s going on? Thanks in advance.