Profiling TensorRT network on TX2 - nvprof vs. IProfiler


I would like to do some profiling for neural network inference on the TX2.
I started with the sampleGoogleNet example of TensorRT and modified it for my use case. The problem is, that I found no information on how IProfiler works and what exactly it measures.

Also I used the nvprof-tool with cudaProfilerStart() and cudaProfilerStop() function to measure the context->execute(…) function and the results differ from the ones with IProfiler.

So how do those two profiling-tools relate to each other, what do they measure exactly and what is the recommended tool/methodology to use for benchmarking?

Thanks in advance for any information.


IProfiler measure the inference time of each layer and is calculating in the TensorRT engine level:

nvprof is a genetic profiling tool and calculating duration in the application level:


Thanks for the fast reply!