Problem using modulus 22.07 in WSL2

Hi,

I tried to upgrade modulus from 2203 to 2207. 2203 worked, however, 2207 didn’t work. I’m using WSL2.

Anyway, I downloaded *.tar image file and load it successfully.
However, running the command
docker run --gpus all --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 7007:7007 -v /home/user/Modulus_2207/myprojects_2207:/myprojects_2207 -it modulus:22.07 bash
gave the error:

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/8802aa6c9c154dbba3d8fdefbae602451e75390392b3070f01303a2083c0f85e/merged/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: file exists: unknown.
ERRO[0001] error waiting for container: context canceled

The error comes from “–gpus all”.

If I remove it, it loads ok but it fails when I tried to run an example, complaining that there’s no GPU present.

If I use:
docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864
–runtime nvidia -v ${PWD}/examples:/examples
-it modulus:22.07 bash

I got the err:
Docker: Eorror response from daemon: Unknown runtime specified nvidia
Btw, I already installed the nvidia-docker2 lib.
In pure linux ubuntu, the 2nd docker command works though.
Any one can help?

Thanks.

2 Likes

Hi @tsltaywb

Unfortunately we updated the nvidia-docker version used in 22.07 and presently there seems to be a bug with it running on WSL. This GitHub issue has the precise error message that I believe you are seeing. Perhaps you can get some information there (seems some have figured out a work around):

Some users have also seen this issue as well. Sorry about this!

Hi ngeneva,

Thanks for the tips. I will give it a try!

Hi, I have found a solution. The steps are as follows:

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:22.08-py3
exit
docker run --gpus all --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864  -p 7006:7006 -v /modulus_source_location/:/modulus_2209  -it nvcr.io/nvidia/pytorch:22.08-py3 bash
cd ../modulus_2209
pip3 install matplotlib transforms3d future typing numpy quadpy\
             numpy-stl==2.16.3 h5py sympy==1.5.1 termcolor psutil\
             symengine==0.6.1 numba Cython chaospy torch_optimizer\
             vtk chaospy termcolor omegaconf hydra-core==1.1.1 einops\
             timm tensorboard pandas orthopy ndim pint

DO NOT INSTALL functorch since you will be forced to uninstall the current pytorch and install 1.12
Go to source dir and install:

python setup.py install
pip install "git+https://github.com/pytorch/functorch.git@590e8618"
If you try to run the example, it will complain of missing libraries.
So you'll need to install these libraries:
apt-get update
apt-get install -y libx11-6
apt-get install libgl1-mesa-glx
apt-get install libxrender1

Then it should work. You can update the image:

docker ps -a

Check the image id of the one you just modified and run:

docker commit d8cf130b5f37 modulus_2209

Note that I haven’t been able to get pysdf working. If anyone manages to get it working, please let me know how.
Thanks.

3 Likes