Modulus 22.07 Container version for Linux issue

My docker load failed trying to load Modulus 22.07. I looked at the tar.gz archive with tar ztf and got the following below error. Is there something wrong with modulus_image_v22.07.tar.gz ?

f8a5c25010c4cc19ea8296cac8c204271966193c216d48eee4fcb28b91524018/json
f8a5c25010c4cc19ea8296cac8c204271966193c216d48eee4fcb28b91524018/layer.tar
fa61d082a571db03626aa8223571d65035298c6d8668db722420c678163e10b8/
fa61d082a571db03626aa8223571d65035298c6d8668db722420c678163e10b8/VERSION
fa61d082a571db03626aa8223571d65035298c6d8668db722420c678163e10b8/json
fa61d082a571db03626aa8223571d65035298c6d8668db722420c678163e10b8/layer.tar

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

I also went to the other way to get the image, nVidia NGC, and got the pull command for the 22.07 image. I got this pulled:

REPOSITORY                       TAG       IMAGE ID       CREATED        SIZE
nvcr.io/nvidia/modulus/modulus   22.07     c3e6e5db96a5   2 weeks ago    16.7GB
modulus                          22.03.1   97fc8407bc47   2 months ago   15.9GB

modulus:22.03 runs fine in docker. However, I get the following for 22.07:

sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /home/chris/modulus/examples:/examples -it nvcr.io/nvidia/modulus/modulus:22.07 bash
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/dab22ad8e7238d197ea7a05199d9dc728424bc2e4584f986b92e6856f1fc493d/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

what is up with this image???

1 Like

I’m having the same issue - had to downgrade to the v22.03.1 image instead

I am using Windows 11, WSL2, Ubuntu 20.04.

Fri Jul 22 19:11:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 516.59       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   41C    P8    18W /  N/A |    988MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

is what show up on the WSL2 Ubuntu terminal from nvidia-smi.
and even from within the container 22.03.1

Fri Jul 22 23:13:41 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 516.59       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   41C    P8    18W /  N/A |   1000MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

but when I run 22.07:

sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v ${PWD}/examples:/examples -it nvcr.io/nvidia/modulus/modulus:22.07 bash
Unable to find image 'nvcr.io/nvidia/modulus/modulus:22.07' locally
22.07: Pulling from nvidia/modulus/modulus
d5fd17ec1767: Already exists
06e980b25883: Already exists
d98931827455: Already exists
3cc9550cd86a: Already exists
c03552cc2849: Already exists
a50cc00269e8: Already exists
3d84b6f81809: Already exists
4ecbefd7921a: Already exists
fffa258a7fcf: Already exists
927a6e4d269a: Already exists
c4ace063e89c: Already exists
c15f7417808e: Already exists
ab690c8f4d73: Already exists
4f4fb700ef54: Already exists
1b09cf07ddb0: Already exists
e9d2ba4e38a5: Already exists
e146b957980e: Already exists
7ee51fbaddfe: Already exists
0e37aa477dee: Already exists
64c360f034e2: Already exists
c5d4ad524b2f: Already exists
0c8ab9153e0b: Already exists
aa7c55cba550: Already exists
5988f6065a39: Already exists
7ceb6ed9943b: Already exists
9888ae33bccd: Already exists
80ba0803c815: Already exists
cb5eca4c89ad: Already exists
8b5feddcfded: Already exists
a17c208aadcd: Already exists
1c8f104115ad: Already exists
9db775429ad5: Already exists
618c64051de8: Already exists
b3d83c7b35db: Already exists
ad4483ce0370: Already exists
de894ea2e3c4: Already exists
4969876c8e54: Already exists
38ac99992f92: Already exists
a40c9cc1c42b: Already exists
d1685070ebe0: Already exists
4827f0e8e627: Already exists
3364ef9b7be4: Already exists
94174e7bf8d5: Already exists
0d3fddaed5f6: Already exists
c49f8f3af0ec: Already exists
7200f0d5811f: Already exists
fcf0f8cae7dc: Already exists
85ec9d9e0ec1: Already exists
3403a2f4e9d6: Already exists
ada487d07900: Already exists
93b33a1a5032: Already exists
89de5eccdc47: Already exists
7c7534ea69b1: Already exists
6cc7d27896b3: Already exists
0214948832c8: Already exists
73fa1223c839: Already exists
5c762c89ade7: Already exists
ca1ba9ca06b9: Already exists
d7fafa231018: Already exists
371cd50d6fbe: Already exists
caa11ea8ab29: Already exists
Digest: sha256:80955c3667348f362a23f2db0f7ee39b577ed8c88406390d9069c4466d8c33de
Status: Downloaded newer image for nvcr.io/nvidia/modulus/modulus:22.07
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/c1b8917dd55d244779d7b3f9048582b7023c32d76c4e8b595f3ba57bf1d19df2/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0003] error waiting for container: context canceled

Not really sure where to go, I have tried almost everything I can find to do, and nothing seems to break 22.03.1, nor fix 22.07.

I also note the following nvidia-docker2 info from apt:

$ apt list -a --installed | grep nvidia-docker2

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

nvidia-docker2/bionic,now 2.11.0-1 all [installed]
nvidia-docker2/bionic 2.10.0-1 all
nvidia-docker2/bionic 2.9.1-1 all
nvidia-docker2/bionic 2.9.0-1 all
nvidia-docker2/bionic 2.8.0-1 all
nvidia-docker2/bionic 2.7.0-1 all

I will also note that I can run the 22.07 image if I leave off the “–gpus all” so even though apt shows nvidia-docker2 installed, maybe it is not registering the environment the way 22.07 needs? Of course without --gpus all, it tells me “WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see Welcome — NVIDIA Cloud Native Technologies documentation .”

Here is what happens when I try to run 22.07 on just cpus:

sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /home/chris/modulus/examples:/examples -it nvcr.io/nvidia/modulus/modulus:22.07 bash

=============
== PyTorch ==
=============

NVIDIA Release 22.05 (build 37432893)
PyTorch Version 1.12.0a0+8a1a93a

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

root@9a93e5bfcb11:/examples# cd waveguide/cavity_2D/
root@9a93e5bfcb11:/examples/waveguide/cavity_2D# python waveguide2D_TMz.py
Traceback (most recent call last):
  File "waveguide2D_TMz.py", line 7, in <module>
    from modulus.csv_utils.csv_rw import csv_to_dict
ModuleNotFoundError: No module named 'modulus.csv_utils'
root@9a93e5bfcb11:/examples/waveguide/cavity_2D#

But without gpus on 22.03.1 it runs fine, just slow…

Something is wrong with the 22.07 container.

I had the same error. Error processing tar file(exit status 1): unexpected EOF

Hi @imsiradws47004374 @gnieuwenhuis and @prakhar_sharma

Thanks for reporting this. Is this the image on DevZone (modulus download page)? If so please try downloading it again, we uploaded an new image to hopefully resolve this issue.

Note (for future reference) we also provide Modulus containers on NGC which can also be pulled as an alternative option: Modulus | NVIDIA NGC

Let me know if the issue still persists.

Thank you! Yes, I had issues (although slightly different, I think I put the output in the posts) that prevented each image from running. The NGC pull did create the image locally in my docker but failed to run, some dependency wasn’t there. The downloaded tar.gz image just would not even load, unexpected EOF.

Thanks again for your help! That is much appreciated!

I now get the same docker runtime error from both methods of getting the image:

sudo docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /home/chris/modulus/examples:/examples -it nvcr.io/nvidia/modulus/modulus:22.07 bash
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/0df94e4800a249b24ebcac7da5164cf517ec827b89fff9238cc4937b6bca877f/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0000] error waiting for container: context canceled

22.03.1 runs fine.

Hi @imsiradws47004374 ,

This looks more like a nvidia-docker issue. My suggestions would be to make sure your CUDA drivers are up to date with what is listed in the user-guide. Also to make sure your desktop docker is up to date.

What hardware are you running on?

Otherwise I would suggest looking on the nvidia container toolkit issues on Github for some solutions that may apply to your system.

Hello again, thank you for your help!

Here is some diagnostic info:


NVidia-DockerInfo.txt (81 Bytes)
NVidiaDriverInfoWindows

NVidia-SMI_Ubuntu.txt (1.4 KB)
NVidia-SMI_Windows.txt (2.7 KB)
WSL2Info.txt (541 Bytes)

installed nvidia-docker: nvidia-docker2/bionic,now 2.11.0-1 all [installed]

This covers my hardware (Alienware X17R2) in the larger PNG file, The windows NVidia driver info in the smaller PNG file. The text files show the output of nvidia-smi in windows powershell and ubuntu bash shell, and the version of nvidia-docker reported in the Ubuntu Linus. Also the WSL2 version.

Under this setup, 22.03.1 runs well.

22.07 seems to fail here: mount error: file creation failed: /var/lib/docker/overlay2/cbf295d0847a9ea1e856d012a9b2fb94781d2747095d870ac394abca5682339c/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown

If I run the 22.07 container without GPU support, modulus starts and I get a bash root prompt. But when I try to run the wave_equation/wave_1d.py example (expecting it to be very slow but run), I get the following error:

root@9a85ff0ab82a:/examples/wave_equation# python wave_1d.py
Traceback (most recent call last):
  File "wave_1d.py", line 7, in <module>
    from modulus.continuous.solvers.solver import Solver
ModuleNotFoundError: No module named 'modulus.continuous'

Thanks again!
Chris

Hi @imsiradws47004374 ,

Thanks for the system information. Unfortunately seems there’s an open issue with NVIDIA Container Toolkit on WSL2. I’ll have to point you to NVIDIA Docker Github for more information. Perhaps this is the reason. Sorry about this!

Github issue: