I’ve got my DGX Spark working well with NGC container with pytorch as referenced in subject. However, as expected, the container defaults to user=root. No real issues with that. Able to bind my home directory from user created during initial boot of DGX Spark. No real issues with that. Able to get my github repositories available from vscode and working in the attached container.
Problem is when I create/modify, the files are owned by root in my DGX Spark file system. I can clearly sudo chmod -R to address. But I don’t want to perform that each time I’m developing inside the container. As I am working to build a new container using the pytorch:26.02-py container as starting point, I establish new user/group from the user id/group id of my host user created during initial configuration of the DGX Spark. userid=1000, and groupid=1000.
The docker build operation fails. Error message : docker error response from daemon: unable to find user: no matching entries in passwd file. So I launch the container again, and discover that the userid and groupid exist in /etc/passwd, and /etc/group. Then /etc/passwd grep for 1000 returns ubuntu user. Then /etc/group grep for 1000 returns ubuntu group.
Other users having this? Workarounds? Would be great if the DGX Spark started in different range other than 1000 or the container started in different range for ubuntu user/group on container.
I don’t want to use shadow files or create a new user on DGX spark. So for now, I’m stuck running an aliased command to perform the sudo chown -R after coding sessions in my container to resolve user/group issues in DGX Spark host.
@prmcd.sw.engineering add yourself to the docker group with sudo usermod -aG docker username, logout then login again. Substitute username for your actual user id.
I don’t like running as root either.
I create a user in the container matching the host uid, and run with --user $USER:$USER.
Here is an example Dockerfile I use:
# docker build --progress plain -t jupyter .
# docker run --user=${USER}:${USER}
# https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch
ARG VER=26.02-py3
FROM nvcr.io/nvidia/pytorch:${VER}
ENV DEBIAN_FRONTEND=noninteracive
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y --no-install-recommends \
emacs-nox \
libxcb1 \
&& \
rm -rf /var/lib/apt/lists/*
COPY .emacs /root
RUN pip install -U pip
RUN pip install \
ipywidgets
# userid=ubuntu is uid 1000
# uid 1001
ARG USER=ed
RUN useradd -m -s /bin/bash ${USER}
USER ${USER}
RUN mkdir /home/${USER}/.cache
COPY --chown=${USER}:${USER} .emacs /home/${USER}/
WORKDIR /home/${USER}/Downloads/python/pytorch
CMD ["jupyter", "lab"]
Note: I find emacs -nw to be the best terminal mux program.
Thanks @elsaco. I put myself in the docker group during the initial configuration work I did. That gets the DGX Spark initial user setup to use docker. No issues there. It is the userid/groupid that comes as a result. Both are 1000 for the first user in DGX Spark. Issue with the container, there is already a userid/groupid of 1000. And it is not the initial user from DGX Spark. Think I’m going to go ahead and create a new user/group on DGX Spark. That should bump both userid/groupid to 1001. Then use @ed.swarthout user addition into dockerfile.
Thanks @ed.swarthout . I’m going to create a new user/group on DGX Spark - hopefully that will bump both to 1001. Then use the dockerfile mods - which I was using prior. And what eventually led me to confirm that pytorch:26.02-py container already has userid/groupid conflicting with initial user created at initial startup of DGX Spark. I plan to get some cycles on that over this weekend. Though DST is going to steal an hour from me. :) Thanks.
I have resolved the issues. Created a new user on DGX Spark. Updated my dockerfile with new user. No conflicts. Lesson learned on that front. Retained the commands for next time I need to perform operation downstream.