cuda race detector detecting race from sgemm_32x32x32_NN

alazarr58cd · July 29, 2018, 10:13pm

Hi,

cuda race detector is telling me there is a race condition in. my call to cublasSgemm().

How is that possible?

Thanks!

alazarr58cd · July 29, 2018, 11:34pm

Update:

ERROR: Potential RAW hazard detected at shared 0x4202 in block (1, 0, 21) :
========= Write Thread (63, 0, 0) at 0x00000068 in sgemm_32x32x32_NN
========= Read Thread (0, 0, 0) at 0x00000078 in sgemm_32x32x32_NN
========= Current Value : 0
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe6b0]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcublas.so.9.0 [0x252180]

========= ERROR: Potential RAW hazard detected at shared 0x4200 in block (0, 0, 21) :
========= Write Thread (63, 0, 0) at 0x00000068 in sgemm_32x32x32_NN
========= Read Thread (64, 0, 0) at 0x00000078 in sgemm_32x32x32_NN
========= Current Value : 0
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe6b0]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcublas.so.9.0 [0x252180]

========= ERROR: Potential RAW hazard detected at shared 0x4201 in block (1, 0, 21) :
========= Write Thread (63, 0, 0) at 0x00000068 in sgemm_32x32x32_NN
========= Read Thread (0, 0, 0) at 0x00000078 in sgemm_32x32x32_NN
========= Current Value : 0
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe6b0]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcublas.so.9.0 [0x252180]

========= ERROR: Potential RAW hazard detected at shared 0x4200 in block (1, 0, 21) :
========= Write Thread (63, 0, 0) at 0x00000068 in sgemm_32x32x32_NN
========= Read Thread (0, 0, 0) at 0x00000078 in sgemm_32x32x32_NN
========= Current Value : 0
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuLaunchKernel + 0x1e8) [0x1fe6b0]
========= Host Frame:/usr/local/cuda-9.0/lib64/libcublas.so.9.0 [0x252180]

Robert_Crovella · July 29, 2018, 11:45pm

I would recommend filing a bug at developer.nvidia.com

njuffa · July 30, 2018, 1:57am

I note that according to the report there is a “Potential RAW hazard”, so this isn’t necessarily indicative of a bug in CUBLAS.

It could be that a tricky piece of code makes it look like there is a RAW hazard, while the overall construction of the code prevents it from actually turning into one.

It is helpful to file a bug report with NVIDIA (as suggested by txbob) because if there is an actual RAW hazard in *GEMM, NVIDIA would want to fix it, and if it turns out to be a false positive, they might want to think about improving the RAW hazard detection so false positives occur less frequently.