Jetson TX2 Benchmark

Hi everyone,

I was reading through the blog and I was wondering how the throughput benchmark was performed for Alexnet and Googlenet and why so high FPS has been achieved? The link for the benchmark follows:

Has only the forward pass been considered without taking memory transfers into account or what?

I thank you for the hints.

Hi,

We have updated benchmark score for a newer package in this blog:

TX2 is designed as an end-device so we more value the inference performance and power.

There is not available report of memory transfer for deep learning use case.
Here is some relevant information in the module datasheet document:
https://developer.nvidia.com/embedded/dlc/jetson-tx2-module-datasheet

Thanks.

Hi AastaLLL,

first of all I thank you for you answer.

My question is more concerned with the methodology. I would like to know what parts of code were measured in order to assess the execution time and therefore estimate the throughput.

Thanks a lot.

Hi,

There is a profiling interface called IProfiler in TensorRT libraries.
You can find a sample code to use it in /usr/src/tensorrt/samples/sampleGoogleNet/sampleGoogleNet.cpp.

More details can also be found in our document here:
http://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/classnvinfer1_1_1_i_profiler.html
[i]------------
application-implemented interface for profiling

When this class is added to an execution context, the profiler will be called once per layer for each invocation of execute(). Note that enqueue() does not currently support profiling.

the profiler will only be called after execution is complete. It has a small impact on execution time.
------------[/i]

Thanks.