On a DGX box with 8 A100 GPUs, I’m trying to profile a program within a Docker container. Specifically, I want to make sure that the inter-GPU communications flow through NVlinks, and thus I profile with the GPU-metric flag. My command is: sudo docker run --name “hy_yeast_multi_0” --rm --cap-add=SYS_ADMIN -v $(pwd):/workspace nvlink nsys profile --gpu-metrics-device=all --force-overwrite true --output=/workspace/profile_report_multi_gpu.nsys-rep /workspace/galactose_rdmeode1.9_test_MultiGPU.py -id 0 -t 10 -g 11.1 -gpus “2,3”.
However, I can’t see GPU metrics in the profiling result file, as shown below.
I don’t see anything in your nsys-rep file that would suggest an issue. But, your version of nsys is relatively old. Can you upgrade to the latest version of Nsight Systems and try the collection again? You can find the latest version at Nsight Systems - Get Started | NVIDIA Developer
Sure! I downloaded and installed Nsight Systems 2024.5.1 inside the Docker container. I then run the command sudo docker run --name “hy_yeast_multi_0” --rm --cap-add=SYS_ADMIN -v $(pwd):/workspace nvlink nsys profile --gpu-metrics-device=all --force-overwrite true --output=/workspace/profile_report_multi_gpu.nsys-rep /workspace/galactose_rdmeode1.9_test_MultiGPU.py -id 0 -t 10 -g 11.1 and get: