I would like to do some profiling for neural network inference on the TX2.
I started with the sampleGoogleNet example of TensorRT and modified it for my use case. The problem is, that I found no information on how IProfiler works and what exactly it measures.
Also I used the nvprof-tool with cudaProfilerStart() and cudaProfilerStop() function to measure the context->execute(…) function and the results differ from the ones with IProfiler.
So how do those two profiling-tools relate to each other, what do they measure exactly and what is the recommended tool/methodology to use for benchmarking?
Thanks in advance for any information.