How to observe the behavior of NVLINK by NVVP and nvprof?

I am profiling a deep learning model, and the framework is tensorflow with NCCL.
I am sure there is a lot of traffic on NVLINK by checking the nvidia-smi.
The ncclAllReduce should make a lot of traffic.
However, I can not see any traffic by NVVP, and the NVLINK analysis is almost empty.(I attach the screen capture).
Will the transfer on NVLINK be shown in the timeline memcpy[D2D]?

I profile the model with the command

mpiexec --allow-run-as-root --bind-to socket -np 2 -x CUDA_VISIBLE_DEVICES=0,1     numactl -N 0 -m 0     nvprof -f -o /dev/shm/lennox/timeline.%q{OMPI_COMM_WORLD_RANK}.nvprof     python --layers 16 -b 32 -u batch -i 200  --log_dir=/data/learning/tmp/         --data_dir=/data/learning/tf/models/research/inception/inception/data/ILSVRC2012/

You need to collect the nvlink metrics using nvprof to see them under NVLink analysis in NVVP.

Use the following nvprof options:

nvprof --aggregate-mode off --event-collection-mode continuous -m nvlink_total_data_transmitted,nvlink_total_data_received,nvlink_transmit_throughput,nvlink_receive_throughput –o