TensorFlow container opening Jupyter Server directory in read-only location - Jupyter Lab cannot be used

Description

TensorFlow container opening Jupyter Server directory in read-only location - Jupyter Lab cannot be used

Environment

TensorRT Version: 24.10, 24.09, and 24.07 discussed
GPU Type: various (HPC environment with a100, l40, l40s, rtx6k, titan)
Nvidia Driver Version: installed in container
CUDA Version: installed in container
CUDNN Version: installed in container
Operating System + Version: Linux Rocky 8 + Apptainer
Python Version (if applicable): installed in container
TensorFlow Version (if applicable): installed in container
PyTorch Version (if applicable): installed in container
Baremetal or Container (if container which image + tag):
nvcr.io/nvidia/tensorflow:24.10-tf2-py3
nvcr.io/nvidia/tensorflow:24.09-tf2-py3

Relevant Files

Attachments are screen shots showing version 24.07 opens the user Home directory (writable). Version 24.09 and 24.10 open an NVIDIA folder in the container (read-only) cannot be directed away from, cannot open notebook.

Steps To Reproduce

I’m an HPC facilitator at the University of Washington. We were very pleased to see that we could use nvcr.io/nvidia/tensorflow:24.07-tf2-py3 (previous version - August 2024) with our Open OnDemand platform to allow our users to open Jupyter Lab on our GPUs. The Jupyter tensorflow-notebook container is not properly configured to use GPUs. We run a shared user environment with various GPUs and use Apptainer to run containers.

When you start a job with tensorflow:24.09-tf2-py3 or tensorflow:24.10-tf2-py3 it doesn’t start in the user’s home directory, it starts in a Nvidia directory (where the license is) with a read-only path INSIDE the container. Hence, users cannot find a notebook in a bound filesystem, nor can they CREATE a new notebook. This container cannot be used with Jupyter, no files can be added, edited, or computed against.

Additionally, I started Jupyter server within the container specifying the bound filesystem to use (rather than attaching it with Open OnDemand) apptainer exec --bind /gscratch/ --home $HOME tensorflow_nvgpu_24.10-tf2-py3.sif jupyter notebook --port 9195 --ip 0.0.0.0 then executed ssh port forwarding, and the same was true. For Jupyter, the container does not allow a bound filesystem, and the user space is unknown and disconnected from the filesystem.

apptainer exec --bind /gscratch/ tensorflow_nvgpu_24.10-tf2-py3.sif python tf_tutorial.py works perfectly, binding the filesystem is only prevented with Jupyter.

Am I missing something that could easily be added to the apptainer command that we use with Open OnDemand to be able to use the new versions of the container? I understand it is intentional to share the user license agreement with the container, but I can’t imagine it was intentional to bloat the size of the container with a version of Jupyter server that cannot be used. I’m hoping that I have missed a work around or that future versions will fix this issue to find a way to share the license and other docs without preventing valuable features of the container.

We will continue to use nvcr.io/nvidia/tensorflow:24.07-tf2-py3, but it would be great to have the latest version of the container that allows Jupyter to be used in a writable, mounted directory.

Thank you,

Kristen Finch
HPC Staff Scientist - Hyak Team
University of Washington Research Computing

Kristen, did you ever get this to work or have a workaround in place? I am having the same issues with Pytorch containers. Unfortunately, even the tensorflow container you mentioned that previously worked for you, isn’t working for me either.

Thanks!
Dori Sajdak
Univ at Buffalo CCR

Hey Dori,

Yes, I ended up reaching out to a contact we have at NVIDIA who prompted their Solutions Architects to look into this. Their solution was to add

--NotebookApp.root_dir=$HOME

to the apptainer exec command. This fixed the issue with scenarios that I laid out above.

Our final solution for Open OnDemand was to change the submit script to open the user Home directory hyak-jupyter/template/script.sh.erb at main · UWrc/hyak-jupyter · GitHub

so we didn’t use the NotebookApp root dir flag after all.

Hope this helps!

Kristen

1 Like

Kristen,

Thanks very much for getting back to me and sharing your solution! I really appreciate it.

Dori

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.