Profiling TensorRT network on TX2 - nvprof vs. IProfiler

Hi,

I would like to do some profiling for neural network inference on the TX2.
I started with the sampleGoogleNet example of TensorRT and modified it for my use case. The problem is, that I found no information on how IProfiler works and what exactly it measures.

Also I used the nvprof-tool with cudaProfilerStart() and cudaProfilerStop() function to measure the context->execute(…) function and the results differ from the ones with IProfiler.

So how do those two profiling-tools relate to each other, what do they measure exactly and what is the recommended tool/methodology to use for benchmarking?

Thanks in advance for any information.
Lisa

Hi,

IProfiler measure the inference time of each layer and is calculating in the TensorRT engine level:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#profiling

nvprof is a genetic profiling tool and calculating duration in the application level:
https://docs.nvidia.com/cuda/profiler-users-guide/index.html

Thanks.

Thanks for the fast reply!