Description
TensorFlow container opening Jupyter Server directory in read-only location - Jupyter Lab cannot be used
Environment
TensorRT Version: 24.10, 24.09, and 24.07 discussed
GPU Type: various (HPC environment with a100, l40, l40s, rtx6k, titan)
Nvidia Driver Version: installed in container
CUDA Version: installed in container
CUDNN Version: installed in container
Operating System + Version: Linux Rocky 8 + Apptainer
Python Version (if applicable): installed in container
TensorFlow Version (if applicable): installed in container
PyTorch Version (if applicable): installed in container
Baremetal or Container (if container which image + tag):
nvcr.io/nvidia/tensorflow:24.10-tf2-py3
nvcr.io/nvidia/tensorflow:24.09-tf2-py3
Relevant Files
Attachments are screen shots showing version 24.07 opens the user Home directory (writable). Version 24.09 and 24.10 open an NVIDIA folder in the container (read-only) cannot be directed away from, cannot open notebook.
Steps To Reproduce
I’m an HPC facilitator at the University of Washington. We were very pleased to see that we could use nvcr.io/nvidia/tensorflow:24.07-tf2-py3 (previous version - August 2024) with our Open OnDemand platform to allow our users to open Jupyter Lab on our GPUs. The Jupyter tensorflow-notebook container is not properly configured to use GPUs. We run a shared user environment with various GPUs and use Apptainer to run containers.
When you start a job with tensorflow:24.09-tf2-py3 or tensorflow:24.10-tf2-py3 it doesn’t start in the user’s home directory, it starts in a Nvidia directory (where the license is) with a read-only path INSIDE the container. Hence, users cannot find a notebook in a bound filesystem, nor can they CREATE a new notebook. This container cannot be used with Jupyter, no files can be added, edited, or computed against.
Additionally, I started Jupyter server within the container specifying the bound filesystem to use (rather than attaching it with Open OnDemand) apptainer exec --bind /gscratch/ --home $HOME tensorflow_nvgpu_24.10-tf2-py3.sif jupyter notebook --port 9195 --ip 0.0.0.0 then executed ssh port forwarding, and the same was true. For Jupyter, the container does not allow a bound filesystem, and the user space is unknown and disconnected from the filesystem.
apptainer exec --bind /gscratch/ tensorflow_nvgpu_24.10-tf2-py3.sif python tf_tutorial.py works perfectly, binding the filesystem is only prevented with Jupyter.
Am I missing something that could easily be added to the apptainer command that we use with Open OnDemand to be able to use the new versions of the container? I understand it is intentional to share the user license agreement with the container, but I can’t imagine it was intentional to bloat the size of the container with a version of Jupyter server that cannot be used. I’m hoping that I have missed a work around or that future versions will fix this issue to find a way to share the license and other docs without preventing valuable features of the container.
We will continue to use nvcr.io/nvidia/tensorflow:24.07-tf2-py3, but it would be great to have the latest version of the container that allows Jupyter to be used in a writable, mounted directory.
Thank you,
Kristen Finch
HPC Staff Scientist - Hyak Team
University of Washington Research Computing
