I’m using NVIDIA containers for Pytorch with Docker and nvidia-docker version 20.10.7, under linux/amd64 with Ubuntu 18.04 distribution, with RTX3090 as gpu.
File "/home/project/nn/layers/initialize.py", line 158, in lsuv_iterations
File “/opt/conda/lib/python3.8/site-packages/torch/nn/init.py”, line 461, in orthogonal_
q, r = torch.linalg.qr(flattened)
RuntimeError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling
cusolverDnXgeqrf( handle, params, m, n, CUDA_R_32F, reinterpret_cast<void*>(A), lda, CUDA_R_32F, reinterpret_cast<void*>(tau), CUDA_R_32F, reinterpret_cast<void*>(bufferOnDevice), workspaceInBytesOnDevice, reinterpret_cast<void*>(bufferOnHost), workspaceInBytesOnHost, info)
The command “torch.nn.init.orthogonal_(weight_swap)” triggered the error, here weight_swap is a tensor stored on the gpu.
I just wanted to report this error. It’s not that problematic too me as I can use pytorch:21.03-py3 instead of pytorch:21.06-py3 for the moment, but i think it needs to be corrected for next releases.
Also, I will need to use torchaudio in a near futur, I’m interrested if someone know how to install it inside the nvidia container.