Hi, I wanted to profile the per layer execution times for backward pass of a DNN. Existing profilers like pytorch profiler give operator-level stats. But, I wanted a layerwise time. So, I wanted to know if Nvidia has a profiler to do this ?
Does DLPROF : DLProf User Guide - NVIDIA Docs help with this ?
Device : Orin Jetpack v5.0.1
Using : Pytorch v2.0.0