Hi everyone,
I got trouble with using gpu inside a container.
I am using Jetpack4.6, downloaded official container nvcr.io/nvidia/l4t-tensorflow:r32.6.1-tf2.5-py3, and nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.8-py3.
After executing the following command, everything looks great:
sudo docker run -it --network host nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.8-py3
However after typing Python3, and import tensorflow as tf, it says: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘libcudart.so.10.2’; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda-10.2/targets/aarch64-linux/lib:
2021-12-14 14:40:50.610016: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Same thing for pytorch container, after typing python3, import torch, I got following:
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
After adding --runtime nvidia to the above commend, I got error message as follows:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: error adding seccomp filter rule for syscall clone3: permission denied: unknown.
ERRO[0008] error waiting for container: context canceled
I searched for this. Someone said that the problem is solved by downgrading docker version, and rebooting, I followed this suggestion, but the error message keeps the same.
I have downgraded docker version from 20.10.10 to 20.10.7
Docker version 20.10.7, build 20.10.7-0ubuntu5~18.04.3
and containerd version to containerd GitHub - containerd/containerd: An open and reliable container runtime 1.5.2-0ubuntu1~18.04.3
Could anyone help me and provide me some suggestions. Thank you very much.
Sincerely,
Yilin