Hello,
We are operating a medium size GPUs cluster.
Our users have difficulty accessing low-level performance counters via CUPTI due to insufficient privilege.
FATAL - CUPTI ERROR: src/cupti_util.c:1437:'err' failed. [Reason] CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
The well-known solution is to enable profiling universally for everyone as outlined here:
Instead, we are trying to access them under Singularity’s fakeroot.
The container is built with the following definition file:
Bootstrap: docker
From: nvidia/cuda:11.6.2-base-centos7
%post
yum -y update
yum -y install epel-release
# CentOS SCL repository
yum -y install centos-release-scl
# Install GCC 10.2
yum -y install devtoolset-10-gcc devtoolset-10-gcc-c++
# Install CUDA
yum -y install cuda-command-line-tools-11-6 \
cuda-cudart-dev-11-6 \
cuda-libraries-dev-11-6
yum clean all
%environment
export PATH=/opt/rh/devtoolset-10/root/usr/bin:\/usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
Then running the container with --fakeroot
$ singularity shell --nv --fakeroot -B <abs_path> libnvcd.sif
INFO: Converting SIF file to temporary sandbox...
Singularity> whoami
root
Singularity> sh libnvcd.sh
...
FATAL - CUPTI ERROR: src/cupti_util.c:1437:'err' failed. [Reason] CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
Here, despite the elevated privilege, we still encounter the same permission error.
- Are the performance counters blocked even with container by design ?
- If not, is there something we can configure to circumvent this restriction without a global solution ?
Regards.