NIC metric data was not collected

tizhong · December 22, 2021, 7:53am

I am using Nsight System 5.1 in the nvcr.io/nvidia/pytorch:20.12-py3 docker container.
The MLNX_OFED_LINUX-5.2-1.0.4.0 is installed in the docker container.
I was profiling a pytorch DDP job on 2 NVIDIA DGX A100 machines.
There was no error messages during running. But error messages appear in the nsys-rep files when I check them in the GUI. Everything is fine except that NIC metric data was not collected.

NICerror

hwilper · December 22, 2021, 2:53pm

Hello, could you give me the exact command line you used (it should be in the analysis summary view if you do not remember).

Also, would you feel comfortable sending us the .nsys-rep file?

@ytebeka you may want to take a look at this.

ytebeka · December 22, 2021, 4:52pm

Yep, it would be nice to:
a. See the command line that you used
b. Have a look at a generated .nsys-rep file, if it is possible.

tizhong · December 23, 2021, 2:41am

Hi @hwilper @ytebeka . Thanks for your quick response.
The command line I used was like this:

/opt/nvidia/nsight-systems/2021.5.1/bin/nsys profile \
	-t cuda,cudnn,nvtx \
	-o $reportName \
	--force-overwrite true \
	--gpu-metrics-set=ga100 \
	--gpu-metrics-device 0 \
	--nic-metrics true \
	python3 -m torch.distributed.launch \
					--nproc_per_node 8 \
					--nnodes $n_node \
					--node_rank $rank \
					--master_addr $addr \
					--master_port $port \
					$scirpt \

And I send the nsys-rep file to @ytebeka by message.

Thanks for your attention to this

tizhong · December 28, 2021, 1:24am

Hi ytebeka. Did you read my message? Should I upload the nsys-rep file here?

ytebeka · December 28, 2021, 8:50am

Hi tizhong,

Sorry for not answering earlier, I was on a short vacation.
I got the nsy-rep file and it indeed does not contain NIC metrics.

I will use the container you used and will try to reproduce the problem.
I’ll update here when I will have results.

Yaki

tizhong · December 29, 2021, 4:26am

Hi @ytebeka thanks for your response. As for the details of the container, please refer to

superbenchmark/cuda11.1.1.dockerfile at main · microsoft/superbenchmark (github.com)

I hope this would be helpful. Thanks for your attention and help.

ytebeka · January 11, 2022, 3:49pm

A fix for the described problem is planned to be added to the next Nsight Systems release.

system · January 25, 2022, 3:49pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	666	April 18, 2024
Can't get GPU Metrics with nsight-system Profiling Linux Targets cuda , kernel	7	3239	June 14, 2024
Unable to profile GPU metrics in nsight system Profiling Linux Targets	4	39	April 17, 2025
Can't get GPU Metrics with Nsight System Profiling Linux Targets cuda	13	350	September 6, 2024
Cannot get tensor core metrics with latest NSight system Profiling Linux Targets cuda , profiling	4	1433	June 20, 2023
Nsys profile in Deepstream container Profiling Linux Targets nsight , deepstream	9	1585	September 10, 2022
Nsight system HPC Linux installation nvc, nvc++ and nvfortran	7	1733	August 31, 2021
Nsys profile doesn't return detail information of cuda/nvtx from docker container Profiling Linux Targets nsight	3	1101	February 28, 2022
Profiling Python code using sudo Profiling Linux Targets nsight , python , profiling	8	2158	March 10, 2022
Nsight System outputs "CUDA trace data was not collected." and there is no result for cuda kernels Profiling Linux Targets nsight	3	1708	September 25, 2023

NIC metric data was not collected

Related topics