I’m taking this course Optimizing CUDA Machine Learning Codes With Nsight Profiling Tools and in the python script, I have implemented # Enables autocasting for the forward pass with torch.cuda.amp.autocast(enabled=True): to use FP16. In Nsight System is not showing any FP16 in
Tensor Active/
it should be
Tensor Active/ FP 16
but it shows only
Tensor Active
This is the command I used for profiling !nsys profile --trace cuda,osrt,nvtx \ --capture-range cudaProfilerApi \ --gpu-metrics-device=all \ --output /dli/task/nsys/thirdOptimization \ --force-overwrite true \ python3 /dli/task/nsys/application/main_opt3.py
Tensor Active - sm__pipe_tensor_cycles_active_realtime.avg.pct_of_peak_sustained_elapsed
The ratio of cycles the SM tensor pipes were active issuing tensor instructions to the number of cycles in the sample period as a percentage.
TU102/4/6: This metric is not available on TU10x for periodic sampling. Please see Tensor Active/FP16 Active.
Tensor Active / FP16 Active - sm__pipe_shared_cycles_active_realtime.avg.pct_of_peak_sustained_elapsed
TU102/4/6 only
^^^^^^^^^^^^
The ratio of cycles the SM tensor pipes or FP16x2 pipes were active issuing tensor instructions to the number of cycles in the sample period as a percentage.
Should this topic be under the Profiling Linux Targets category?