How to authenticate to nvcr.io with SLURM + pyxis?

How do I authenticate to the nvcr.io registry when I want to run NGC containers with SLURM and pyxis? I do have an API key, but I don’t know where to put it
I deployed SLURM with deepops and tried to run:

$ srun --container-image=‘docker://$oauthtoken@nvcr.io#nvidia/tensorflow:20.10-tf2-py3’ grep PRETTY /etc/os-release

this gives:

pyxis: importing docker image …
slurmstepd: error: pyxis: child 14756 failed with error code: 1
slurmstepd: error: pyxis: failed to import docker image
slurmstepd: error: pyxis: printing contents of log file …
slurmstepd: error: pyxis: [INFO] Querying registry for permission grant
slurmstepd: error: pyxis: [INFO] Authenticating with user: docker://$oauthtoken
slurmstepd: error: pyxis: [ERROR] URL https://nvcr.io/proxy_auth returned error code: 401 Unauthorized
slurmstepd: error: pyxis: couldn’t start container

When I login with docker to nvcr.io the error with srun is the same

thx for help

1 Like

In the meantime I figured out how to download NGC images in a separate step with enroot.
Maybe it’s possible to do it all in one when calling srun, but my problem is solved for now.

1 Like

I have the same problem too.

Please take a look at deepops/docker-login.md at master · NVIDIA/deepops · GitHub

EDIT: solved, was a silly mistake, when configuring the .credentials file, the $oauthtoken does not go into the command. So the correct command is

srun --container-image='nvidia/cuda:11.2.2-devel-ubuntu20.04' --gres=gpu:a100:1 --pty nvidia-smi -L

And it works.

– OLDER MESSAGE –
Having the same problem here,
Did what matthias.leopold mentioned, but still failing.
I followed the documentation and made sure ENROOT_CONFIG_PATH environment variable is set.

➜  ~ echo $ENROOT_CONFIG_PATH                                                                                         
/home/MYUSER/.config/enroot

My Credential file is

➜  ~ cat $HOME/.config/enroot/.credentials 
# NVIDIA GPU Cloud (both endpoints are required)
machine nvcr.io login $oauthtoken password MYNGCTOKEN
machine authn.nvidia.com login $oauthtoken password MYNGCTOKEN

# DockerHub
# machine auth.docker.io login <login> password <passord>

# Google Container Registry
# machine gcr.io login oauth2accesstoken password $(gcloud auth print-access-token)
# machine gcr.io login _json_key password $(jq -c '.' $GOOGLE_APPLICATION_CREDENTIALS | sed 's/ /\\u0020/g')

Finally, running a test example fails

➜  ~ srun --container-image='docker://$oauthtoken@nvcr.io#nvidia/cuda:11.2.2-devel-ubuntu20.04' --gres=gpu:a100:1 --pty nvidia-smi -L
pyxis: importing docker image ...
slurmstepd: error: pyxis: child 810178 failed with error code: 1
slurmstepd: error: pyxis: failed to import docker image
slurmstepd: error: pyxis: printing contents of log file ...
slurmstepd: error: pyxis:     [INFO] Querying registry for permission grant
slurmstepd: error: pyxis:     [INFO] Authenticating with user: docker://$oauthtoken
slurmstepd: error: pyxis:     [ERROR] URL https://nvcr.io/proxy_auth returned error code: 401 Unauthorized
slurmstepd: error: pyxis: couldn't start container
slurmstepd: error: pyxis: if the image has an unusual entrypoint, try using --no-container-entrypoint
slurmstepd: error: spank: required plugin spank_pyxis.so: task_init() failed with rc=-1
slurmstepd: error: Failed to invoke spank plugin stack
srun: error: nodeGPU01: task 0: Exited with exit code 1

any ideas what is missing? thanks in advance

1 Like

Can you try without user in docker URL?
→ srun
–container-image=‘docker://nvcr.io#nvidia/cuda:11.2.2-devel-ubuntu20.04’

Matthias

Yes, that was the problem, many thanks