CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

it happens with multiple GPU training with newer TF such as 2.8+ including NGC-TF versions.
the same code works fine with TF2.4 multiGPU training or with 1-GPU.
The model includes LSTM layer with input + mask.

some tests done:
without lstm layer, no error produced.
with lstm layer without mask input, no error produced.
with lstm layer with mask input, disable cudnnRNNV3, no error produced.

note: the error may appear after some epochs. if batch size is small the error may disappear like 10% of GPU memory, but we would like use maximum GPU memory.

this is already reported, the bug id is 3938371.
The code is attached. (7.2 KB)