it happens with multiple GPU training with newer TF such as 2.8+ including NGC-TF versions.
the same code works fine with TF2.4 multiGPU training or with 1-GPU.
The model includes LSTM layer with input + mask.
some tests done:
without lstm layer, no error produced.
with lstm layer without mask input, no error produced.
with lstm layer with mask input, disable cudnnRNNV3, no error produced.
note: the error may appear after some epochs. if batch size is small the error may disappear like 10% of GPU memory, but we would like use maximum GPU memory.
this is already reported, the bug id is 3938371.
The code is attached.
code.zip (7.2 KB)