"NvRmMemInitNvmap failed with Permission denied" error when running nvidia-docker in rootless mode on Jetson Orin Nano

Hi,

Sorry for the missing.
We will check this internally and provide more info to you.

Thanks.

2 Likes

Hi,

Thanks for your patience.

Although we are still working on this issue, here are some updates for you.
To enable /dev/nvmap access inside the rootless container, you can set the permission like this:

Test with “test_user” user:

1. Create a group

Using “test_user_group” here:

$ sudo groupadd test_user_group
$ sudo usermod -aG test_user_group test_user
$ sudo chown root:test_user_group /dev/nvmap

Re-login or reboot and verify with:

$ cat /dev/nvmap 
cat: /dev/nvmap: Invalid argument

It’s expected to see “Invalid argument” instead of “Permission denied”, which indicates the permission has been enabled for the ‘test_user_group’.

2. Update subgid setting

In order to utilize these permissions in the rootless docker container, please also edit /etc/subgid.

Obtain group ID number, group ID is 1003 in the below use case.

$ getent group test_user_group
test_user_group:x:1003:test_user

Original /etc/subgid:

# [username:subgid_start:subgid_length]
$ cat /etc/subgid
...
test_user:165536:65536

Modify into:

# [username:subgid_start:group_id - 1, username:group_id:1, username:subgid_start + group_id + 1:subgid_length - (group_id + 1)]
$ cat /etc/subgid
...
test_user:165536:1002
test_user:1003:1
test_user:166540:64542

3. Restart the docker service

$ systemctl --user restart docker.service 

4. Testing

# cat /dev/nvmap 
cat: /dev/nvmap: Invalid argument

We can access the /dev/nvmap inside the container after the above steps.
However, our container fails to initial CUDA for other permission and we are still checking on that.

Could you also give it a try in your environment as well?

Thanks.

Hi,

yes I can verify that following your instructions the error changed to:

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ Operation not supported (error 801) ]]

Failed to detect NVIDIA driver version.

For sake of completness I think there is an arithmetic error in the third line of the /etc/subgid file:
Following your formula it should be ‘test_user:166540:64532’ right?

I was just wondering as the rootless access to the GPU works on the “ubuntu” user, why is it necessary to give file permissions for “test_user”?

Hi,

Sorry for the mistake.
The third line is test_user:166540:64532

The setting is to allow the rootless docker container to utilize the permissions.
Thanks.

Hi,

Thanks a lot for your patience.
On the Jetson device, render group is also required for the container. Detailed steps are:

Add test_user

$ sudo groupadd test_user_group
$ sudo useradd -m -g test_user_group -G video,render test_user
$ sudo passwd test_user

Edit /etc/subgid

Before

ubuntu:100000:65536
test_user:165536:65536

After

ubuntu:100000:65536
test_user:165536:43
test_user:44:1
test_user:165581:59
test_user:104:1
test_user:165641:65431

Login as test_user

$ dockerd-rootless-setuptool.sh install
$ nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
$ systemctl --user restart docker
$ id
uid=1001(test_user) gid=1001(test_user_group) groups=1001(test_user_group),44(video),104(render)
$ docker run -it --rm --net=host --runtime nvidia --group-add=video --group-add=104 nvcr.io/nvidia/pytorch:24.12-py3-igpu

=============
== PyTorch ==
=============

NVIDIA Release 24.12 (build 126674151)
PyTorch Version 2.6.0a0+df5bbc0
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2024 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

groups: cannot find name for group ID 104
root@tegra-ubuntu:/workspace# python
Python 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.rand(10).to(torch.device("cuda"))
tensor([0.1885, 0.9269, 0.7221, 0.9849, 0.4552, 0.9575, 0.3322, 0.7051, 0.8399,
        0.4589], device='cuda:0')

Thanks.

Great, thanks!! This also works for me!!

The message “groups: cannot find name for group ID 104” can be ignored?

Yes, the message can be ignored.
Thanks for the confirmation.

Even though it immediately worked on my device, we had problems setting it up on a different one, where user id and render group id are different.

Could you please provide the formulae for the subgid file?

Hi,

Sorry for the late update.
We need to check with our internal team and share more in to you later.

Thanks.

Hi,

Please find below for the details:

Previous samples are based on :

GID of group video : 44
GID of group render : 104
Current subgid entry for user: test_user:165536:65536

If that changes on a system, please use the following script to generate new subgid details on that system :

$ cat ./gen-subgid.sh 
#!/bin/bash
set -eu
CURR_SUBGID=$1

GID_VIDEO=$(getent group video|cut -d: -f3)
GID_RENDER=$(getent group render|cut -d: -f3)

echo "GID of group video : ${GID_VIDEO}"
echo "GID of group render : ${GID_RENDER}"

if [ ${GID_VIDEO} -lt ${GID_RENDER} ] ; then
        GROUP1=$GID_VIDEO
        GROUP2=$GID_RENDER
else
        GROUP1=$GID_RENDER
        GROUP2=$GID_VIDEO
fi

IFS=':' read -r USER START COUNT << EOF
$CURR_SUBGID
EOF

echo ""
echo "Current subgid entry for user:"
echo "$CURR_SUBGID"

echo ""
echo "New subgid entry for user:"
echo $USER:$START:$((GROUP1 - 1))
echo $USER:$GROUP1:1
echo $USER:$((START + GROUP1 + 1)):$((GROUP2 - GROUP1 - 1))
echo $USER:$GROUP2:1
echo $USER:$((START + GROUP2 + 1)):$((COUNT - GROUP2 - 1))

Usage example:

$ ./gen-subgid.sh "test_user:165536:65536"
GID of group video : 44
GID of group render : 104

Current subgid entry for user:
test_user:165536:65536

New subgid entry for user:
test_user:165536:43
test_user:44:1
test_user:165581:59
test_user:104:1
test_user:165641:65431

Please note that whenever adding another user, it re-orders /etc/subgid, effectively messing up the list.
So please re-create the /etc/subgid after a user is added to the system.

Thanks.

Thank you!! This resolves this issue!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.