Nsight systems not showing Tensor Active/FP16 for GPU GA10x

I’m taking this course Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools and in the python script, I have implemented
# Enables autocasting for the forward pass with torch.cuda.amp.autocast(enabled=True): to use FP16. In Nsight System is not showing any FP16 in

Tensor Active/

it should be

Tensor Active/ FP 16

but it shows only

Tensor Active

This is the command I used for profiling !nsys profile --trace cuda,osrt,nvtx \ --capture-range cudaProfilerApi \ --gpu-metrics-device=all \ --output /dli/task/nsys/thirdOptimization \ --force-overwrite true \ python3 /dli/task/nsys/application/main_opt3.py

Quoting from the documentation

  • Tensor Active - sm__pipe_tensor_cycles_active_realtime.avg.pct_of_peak_sustained_elapsed
    The ratio of cycles the SM tensor pipes were active issuing tensor instructions to the number of cycles in the sample period as a percentage.
    TU102/4/6: This metric is not available on TU10x for periodic sampling. Please see Tensor Active/FP16 Active.
  • Tensor Active / FP16 Active - sm__pipe_shared_cycles_active_realtime.avg.pct_of_peak_sustained_elapsed
    TU102/4/6 only
    ^^^^^^^^^^^^
    The ratio of cycles the SM tensor pipes or FP16x2 pipes were active issuing tensor instructions to the number of cycles in the sample period as a percentage.

Should this topic be under the Profiling Linux Targets category?

1 Like