About the memory access bug fixed in 'CUDA Ver11.5 Update1'


Please tell me about the memory access bug fixed in ‘CUDA Ver11.5 Update1’ described in ‘1.4. Resolved Issues’ of ‘NVIDIA CUDA Toolkit Release Notes’ linked below.
Release Notes :: CUDA Toolkit Documentation

1.4. Resolved Issues
1.4.1. CUDA Compilers
11.5. Update 1
・Fixed a compiler optimization bug that may move memory access instructions across memory barriers that may lead to incorrect runtime results with certain synchronization dependencies.

Please give me a little more detailed bug information about the conditions under which memory access problems occur.
-Does it occur in the RTX 3000 series, not in the GTX 1000 series and RTX 2000 series?
-It is written that it is a bug of CUDA compiler optimization, but is there a condition for occurrence in the compilation option?
-What is the corresponding error code?

The version I’m using is:
CUDA 11.4.0
Linux x86_64 Driver 470.42.01

The phenomenon I’m encountering is that a program that works fine on the GTX 1000 series and RTX 2000 series causes an error on the RTX 3000 series.
Of course, each uses a GPU these are compatible with the above versions.
The position in the source code where the error occurs is the part where the operation result is assigned to the variable of the structure.
Running this program on the GTX 1000 series and RTX 2000 series works fine.
On the other hand, as a result of executing with cuda-gdb on the RTX3000 series, the following problems were displayed.

CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x11ef020

Thread 15 “TestTool” received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 0, grid 140, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 2, lane 0]
0x00000000011ef0a0 in TEST::CalcBeforeProc(float*, float*, int, CalcDateList*, int, CalcDateInfo*, int, int, int, int, int, int, int, int)<<<(1,1,1),(256,1,1)>>> ()

Is this error related to the memory access bug fixed in ‘CUDA Ver11.5 Update1’?

I don’t believe it is related.

The optimization being referred to is something like (at the SASS level):

LD ...


LD ...

The report indicates there won’t be an “error code” associated with this, or a runtime error. Instead it “may lead to incorrect runtime results”. i.e. the detection method for the error is incorrect calculation results.

This isn’t going to create a warp-out-of-range-address condition where there wasn’t one previously.

As a simple check, upgrade your test machine to the latest CUDA version.

Thank you for your advice.
I will further investigate the cause, including trying with the latest CUDA.

Final result report

After changing the CUDA and Linux driver to new ones, this error no longer occurs.
I have not changed the source code of the application software I created.
I don’t know the exact cause of the error because no error occurs when using the debugger.

  • Error occurred combination
    CUDA: 11.4.0
    Linux x86_64 Driver: 470.42.01
    GPU: RTX3060

  • Error-free combination
    CUDA: 11.6.0
    Linux x86_64 Driver: 510.39.01
    GPU: RTX3060

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.