CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

yu.fang1 · January 11, 2023, 9:01pm

it happens with multiple GPU training with newer TF such as 2.8+ including NGC-TF versions.
the same code works fine with TF2.4 multiGPU training or with 1-GPU.
The model includes LSTM layer with input + mask.

some tests done:
without lstm layer, no error produced.
with lstm layer without mask input, no error produced.
with lstm layer with mask input, disable cudnnRNNV3, no error produced.

note: the error may appear after some epochs. if batch size is small the error may disappear like 10% of GPU memory, but we would like use maximum GPU memory.

this is already reported, the bug id is 3938371.
The code is attached.

code.zip (7.2 KB)

Topic		Replies	Views
Intermittent CUDA_ERROR_ILLEGAL_ADDRESS error on Ubuntu 18.04 with TensorFlow 2.2.0 Frameworks cuda , tensorflow	3	7904	January 5, 2023
CUDA error: an illegal memory access was encountered Linux	0	875	October 28, 2020
illegal memory exception on T4 TensorRT	5	1078	July 17, 2019
Cuda failure: an illegal memory access was encountered GPU-Accelerated Libraries tensorrt , cuda	1	909	March 9, 2021
Illegal access errors only when compiling with -g -G parameters CUDA-GDB	2	1109	March 3, 2017
700 an illegal memory access was encountered CUDA Programming and Performance	1	1260	September 2, 2022
Illegal memory access cuDNN	0	559	October 23, 2022
Trtexec with CUDAGraph happen rarely Cuda failure: an illegal memory access was encountered.? TensorRT cuda	3	511	January 29, 2024
Deepstream Cuda Illegal memory Address DeepStream SDK	2	496	August 8, 2022
Tensorflow GPU - GPU detected but never used and computer crash on Windows 10 - RTX 2070 CUDA Setup and Installation	7	5097	November 8, 2022

CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Related topics