Update to libnvidia-container-tools has broken docker functionality

I have been using the container nvcr.io/nvidia/pytorch:21.07-py3 daily for the last few weeks without any issues whatsoever. Last night I accepted an ubuntu update dialog popup and the above container no longer loads. The error message is:

$ nvidia-docker run -p 8888:8888 --ipc=host --gpus all -it --rm -v /home/orca/:/home/t/ nvcr.io/nvidia/pytorch:21.07-py3 
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: signal: segmentation fault (core dumped), stdout: , stderr:: unknown.

I get the same error message when trying docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

Looking through my update logs (/var/log/apt/history.log), it seems the following update is what broke functionality:

Start-Date: 2021-08-16 19:49:10
Commandline: aptdaemon role='role-commit-packages' sender=':1.148'
Upgrade: libnvidia-container-tools:amd64 (1.4.0-1, 1.5.0~rc.1-1), openssl:amd64 (1.1.1-1ubuntu2.1~18.04.9, 1.1.1-1ubuntu2.1~18.04.10), libnvidia-container1:amd64 (1.4.0-1, 1.5.0~rc.1-1), gir1.2-snapd-1:amd64 (1.49-0ubuntu0.18.04.2, 1.58-0ubuntu0.18.04.0), libsnapd-glib1:amd64 (1.49-0ubuntu0.18.04.2, 1.58-0ubuntu0.18.04.0), libssl1.1:amd64 (1.1.1-1ubuntu2.1~18.04.9, 1.1.1-1ubuntu2.1~18.04.10), nvidia-container-toolkit:amd64 (1.5.1-1, 1.6.0~rc.1-1) End-Date: 2021-08-16 19:49:13

Here is the output of nvidia-smi:
nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   51C    P8    30W / 320W |   1070MiB / 10014MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1191      G   /usr/lib/xorg/Xorg                 36MiB |
|    0   N/A  N/A      1312      G   /usr/bin/gnome-shell              103MiB |
|    0   N/A  N/A      2130      G   /usr/lib/xorg/Xorg                452MiB |
|    0   N/A  N/A      2299      G   /usr/bin/gnome-shell               86MiB |
|    0   N/A  N/A      2679      G   ...AAAAAAAAA= --shared-files       99MiB |
|    0   N/A  N/A      3470      G   /usr/lib/firefox/firefox          249MiB |
|    0   N/A  N/A      3600      G   /usr/lib/firefox/firefox            3MiB |
|    0   N/A  N/A      3761      G   /usr/lib/firefox/firefox            3MiB |
|    0   N/A  N/A      9373      G   /usr/lib/firefox/firefox            3MiB |
|    0   N/A  N/A      9515      G   /usr/lib/firefox/firefox            3MiB | +-----------------------------------------------------------------------------+

Finally, nvidia-container-cli no longer seems to work:

$ nvidia-container-cli info
Segmentation fault (core dumped)

Thanks for any help in fixing this issue!

1 Like

Same issue

Downgrade works for now.

apt install libnvidia-container1=1.4.0-1 libnvidia-container-tools=1.4.0-1 nvidia-container-toolkit=1.5.1-1

1 Like

Thanks so so much, this is working for me too!

1 Like