Relation between MPS, Nsight Systems, CUDA Drivers and Singularity Containers

I am trying to profile a multi-process application within a container using MPS. The relation between MPS and a multi-process is clear: MPS organizes GPU contexts from multiple processes into a uniform one to save extra resources for context switches. I had no problem profiling the application without using containers. However, I am confused when singularity containers are involved.

First of all, “–nv” option is needed to provide host GPU resources to containers in singularity. My understanding is that only the CUDA driver is shared with containers through the “–nv” option, while CUDA libraries are installed within a container image. I am wondering if this understanding is accurate.

Secondly, I am wondering how MPS interacts with applications within containers. Should the MPS daemon be started from within a container, or is it shared from host to containers through “–nv”? Also, I observed that root privilege is needed to write to “/var/log/nvidia-mps”. In a container, the log cannot be written to “/var/log/nvidia-mps” as I started the MPS. What is the correct way to use MPS with containers?

Thirdly, how can I profile MPS-based multi-process applications from containers? In my application, there is a server that handles jobs (including kernel launch), which are forwarded from multiple clients, and it is the server’s performance that I care about. Specifically, what will the input command be like for the Nsight Systems? Should the Nsight Systems be started from within a container using command line, or from the outside using GUI?

I would really appreciate it if someone can provide me general advice given my tasks. Thanks in advance.