The conda environments in NVIDIA’s pytorch containers are inconsistent.
The simplest steps to reproduce are to run any conda install command in the newest container, like:
docker run --rm -it --entrypoint /bin/bash nvcr.io/nvidia/pytorch:22.07-py3 $ conda install conda
Then look for “The environment is inconsistent” near the top of the output (it takes a lot of time and you have to scroll up to see it in the very long output).
The last container in which this doesn’t happen is 22.01.
However, even the 22.01 container still has a possibly related issue.
Inside it, conda and pip don’t see opencv installed (running
conda list | grep cv and
pip list | grep cv in the container gives empty results).
But there are files in the environment as if it were installed, namely in /opt/conda/lib/python3.8/site-packages/cv2 with an old version 3.4.11 of opencv.
Because of this, after running the following:
docker run --rm -it --entrypoint /bin/bash nvcr.io/nvidia/pytorch:22.01-py3 $ conda list | grep cv $ conda install opencv -y $ conda list | grep cv $ python3 -c "import cv2; print(cv2.__version__)"
even though opencv 4.6.0 gets installed by conda, the python command still shows 3.4.11. Which causes all sorts of things to fail. A workaround is to
rm --rf the old cv2 files from /opt/conda/lib/python3.8/site-packages/cv2.
This happens also in the newest, 22.07 container, and even if I first activate the conda base environment with
$ conda init $ . ~/.bashrc $ conda activate