Access CUPTI Performance Counter as fakeroot inside Singularity Container

Hello,

We are operating a medium size GPUs cluster.
Our users have difficulty accessing low-level performance counters via CUPTI due to insufficient privilege.

FATAL - CUPTI ERROR: src/cupti_util.c:1437:'err' failed. [Reason] CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

The well-known solution is to enable profiling universally for everyone as outlined here:

Instead, we are trying to access them under Singularity’s fakeroot.
The container is built with the following definition file:

Bootstrap: docker
From: nvidia/cuda:11.6.2-base-centos7

%post
    yum -y update
    yum -y install epel-release

    # CentOS SCL repository
    yum -y install centos-release-scl

    # Install GCC 10.2
    yum -y install devtoolset-10-gcc devtoolset-10-gcc-c++

    # Install CUDA
    yum -y install cuda-command-line-tools-11-6 \
        cuda-cudart-dev-11-6 \
        cuda-libraries-dev-11-6

    yum clean all

%environment
    export PATH=/opt/rh/devtoolset-10/root/usr/bin:\/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
    export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64

Then running the container with --fakeroot

$ singularity shell --nv --fakeroot -B <abs_path> libnvcd.sif
INFO:    Converting SIF file to temporary sandbox...
Singularity> whoami 
root
Singularity> sh libnvcd.sh 
...
FATAL - CUPTI ERROR: src/cupti_util.c:1437:'err' failed. [Reason] CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

Here, despite the elevated privilege, we still encounter the same permission error.

  1. Are the performance counters blocked even with container by design ?
  2. If not, is there something we can configure to circumvent this restriction without a global solution ?

Regards.

Hi, @vitduck

“despite the elevated privilege, we still encounter the same permission error.”
---------Do you mean you already elevated privilege outside Singularity, but you still got this error ?

Hi, @veraj
Sorry for not being clear.

On bare-metal, everything works as expected:

  • Normal users got the aforementioned permission error.
  • Sudoers can access performance counters.

However, upper management is reluctant to enable low-level profiling for all users per NVIDIA’s guidelines.
Thus, we are testing a way for our users to access performance counters under a container environment.

As you can see from singularity prompt, with --fakeroot option, user becomes root ( but only within container)
If my understanding is correct, I should be able to access CUPTI as root inside a container.
However, the permission error still persists, hence my confusion.

Please let me know if you need more information.

Hi, @vitduck

I am afraid this is because --fakeroot didn’t grant enough access permission for host device. This is not decided by CUPTI.

Yes, it seems to be the case with Singularity.
There is no other way to circumvent this problem for regular users.

Thanks for your clarification.