I’ve been trying to use nsight systems to profile my cuda program in kubernetes pod last week. But nsight systems didn’t start generate report when my program finished. I found that nsys reported error log like this:
Connection to Agent lost. This is most likely a bug. Internal reason: ‘End of file’. Please refer to the troubleshooting section of the docs: User Guide — nsight-systems
I’m using cuda driver 570 and cuda toolkit 12.8. I download the nsight system cli 2025.5.1.121 and installed in the pod manually. I asked LLM how to fix the bug, and here are the solutions i tried.
-
Add pod capabilities: SYS_PTRACE, SYS_ADMIN, IPC_LOCK. Didn’t work.
securityContext: runAsUser: 0 runAsGroup: 0 allowPrivilegeEscalation: true capabilities: add: - SYS_PTRACE - SYS_ADMIN - IPC_LOCK -
Enlarge /dev/shm to 16GB. Didn’t work
- mountPath: /dev/shm name: dshm - name: dshm emptyDir: medium: Memory sizeLimit: 16Gi -
Enable hostPID. Solved the “end of file“ bug, but got new bug log about processing QDSTRM file. The QDSTRM file was generated but .nsys-rep file was not generated.
hostPID: true # Allow access Host PID namespaceImporter error status: An unknown error occurred. Unable to retrieve the importer version: skipping importation of the QDSTRM file.
-
According to User Guide — nsight-systems , if .nsys-rep file is not generated, we can use QdstrmImporter to convert QDSTRM file to nsys-rep file. QdstrmImporter is in in the Host-x86_64 directory, and I tried to run it. I got a log that libcap2.so can’t be found. I copied the libcap2.so from Target-x86_64 to Host-x86_64. Finally, the nsys profile command run sucessfully.
Just record my debug experience here, in case someone else meet the same problem.