I am attempting to create a dockerized Jupyter notebook container based on the DLI image with my own additions of using PyTorch graph libraries.
The way I intend to do is that after I checked the requisite versions of Torch and Cuda found in this base image, I want to run as part of the Dockerfile:
RUN pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.6.0+cu102.html
Which should install the correct versions. However, once I start the build for this container, I receive the following error message when getting to this particular package install:
It appears that the installer checks compatibility of every version of torch-scatter, and the common issue appears to be
OSError: libcurand.so.10: cannot open shared object file: No such file or directory
For every version it tries to download.
The code I am using to install I know works when trying to install these libraries to specific Torch and Cuda on Google Colab, so I am unsure if the issue is a result of how I am installing via Docker.
For transparency, linked below is the Dockerfile I am using, and the build command I use is also pasted:
May I know which JetPack version do you use for setup Jetson Nano?
Since the container is built on the top of r32.5.0, please use JetPack 4.5.x for compatibility.
I am using the most recent JetPack, which is 4.6. If that is the case should I re-install a 4.5.x version? Alternatively, is there a different image to build on top of?
I wanted to provide a quick update to ask if this behavior is expected, but I have adjusted the base image to nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1 to be compatible with my Jetpack 4.6, which additionally changes what version of torch-scatter and the requisite packages I should use are (i.e. torch 1.9 not 1.6).
But now, the trouble is that when I run sudo docker build -t nsk367/jetson_gnn ., the new issue appears to be that this package never seems to install. After completing all of the prior parts of the Docker build, the program seems to take an unreasonable amount of time to install this package.
Pasted below is that message:
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Removing intermediate container e59382fbcab2
---> 9586aeab378e
Step 9/14 : RUN pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
---> Running in 74699431815b
Looking in links: https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
Collecting torch-scatter
Downloading torch_scatter-2.0.8.tar.gz (21 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: torch-scatter
Building wheel for torch-scatter (setup.py): started
Building wheel for torch-scatter (setup.py): still running...
Building wheel for torch-scatter (setup.py): still running...
Building wheel for torch-scatter (setup.py): still running...
Building wheel for torch-scatter (setup.py): still running...
Have you tried it on a JetPack 4.5.1 environment?
If yes, did it work?
Not sure if this is the root cause.
But it’s common that the docker doesn’t have enough memory and gets stuck.
To solve this, please run it with the configure suggested in the below comment:
I have not yet tried on the JetPack 4.5.1 environment, but think I will give that a shot now.
Instead of re-installing, I first tried replacing the base image with nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.6.1, but the issue remained.
This has all been occurring during the docker build command, not docker run, when I install these various packages. I deferred the installation of these packages, added --memory=500M --memory-swap=8G to my docker run script, but unfortunately that did not seem to help either.