Error with pytorch torch.nn.init.orthogonal_ with NVIDIA container image for PyTorch release 21.06

Hello,

I’m using NVIDIA containers for Pytorch with Docker and nvidia-docker version 20.10.7, under linux/amd64 with Ubuntu 18.04 distribution, with RTX3090 as gpu.

When using the container for Pytorch nvcr.io/nvidia/pytorch:21.03-py3 my code run flawlessly. But when using the latest containtainer (nvcr.io/nvidia/pytorch:21.06-py3), I got the following error:

File "/home/project/nn/layers/initialize.py", line 158, in lsuv_iterations
torch.nn.init.orthogonal_(weight_swap)
File “/opt/conda/lib/python3.8/site-packages/torch/nn/init.py”, line 461, in orthogonal_
q, r = torch.linalg.qr(flattened)
RuntimeError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling cusolverDnXgeqrf( handle, params, m, n, CUDA_R_32F, reinterpret_cast<void*>(A), lda, CUDA_R_32F, reinterpret_cast<void*>(tau), CUDA_R_32F, reinterpret_cast<void*>(bufferOnDevice), workspaceInBytesOnDevice, reinterpret_cast<void*>(bufferOnHost), workspaceInBytesOnHost, info)

The command “torch.nn.init.orthogonal_(weight_swap)” triggered the error, here weight_swap is a tensor stored on the gpu.

I just wanted to report this error. It’s not that problematic too me as I can use pytorch:21.03-py3 instead of pytorch:21.06-py3 for the moment, but i think it needs to be corrected for next releases.

Also, I will need to use torchaudio in a near futur, I’m interrested if someone know how to install it inside the nvidia container.

Best regards,
Thomas