Hello,
Please tell me about the memory access bug fixed in ‘CUDA Ver11.5 Update1’ described in ‘1.4. Resolved Issues’ of ‘NVIDIA CUDA Toolkit Release Notes’ linked below.
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#title-new-features
Description:
1.4. Resolved Issues
1.4.1. CUDA Compilers
11.5. Update 1
・Fixed a compiler optimization bug that may move memory access instructions across memory barriers that may lead to incorrect runtime results with certain synchronization dependencies.
Question:
Please give me a little more detailed bug information about the conditions under which memory access problems occur.
-Does it occur in the RTX 3000 series, not in the GTX 1000 series and RTX 2000 series?
-It is written that it is a bug of CUDA compiler optimization, but is there a condition for occurrence in the compilation option?
-What is the corresponding error code?
The version I’m using is:
CUDA 11.4.0
Linux x86_64 Driver 470.42.01
The phenomenon I’m encountering is that a program that works fine on the GTX 1000 series and RTX 2000 series causes an error on the RTX 3000 series.
Of course, each uses a GPU these are compatible with the above versions.
The position in the source code where the error occurs is the part where the operation result is assigned to the variable of the structure.
Running this program on the GTX 1000 series and RTX 2000 series works fine.
On the other hand, as a result of executing with cuda-gdb on the RTX3000 series, the following problems were displayed.
CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x11ef020
Thread 15 “TestTool” received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 140, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 2, lane 0]
0x00000000011ef0a0 in TEST::CalcBeforeProc(float*, float*, int, CalcDateList*, int, CalcDateInfo*, int, int, int, int, int, int, int, int)<<<(1,1,1),(256,1,1)>>> ()
Is this error related to the memory access bug fixed in ‘CUDA Ver11.5 Update1’?