Jetson AGX Xavier Deep Learning Inference Benchmarks

dusty_nv · November 27, 2018, 4:30pm

Hi all, we’ve published a comprehensive set of deep learning inference performance and energy efficiency benchmarks with Jetson AGX Xavier.

See here for the results — [b][url]https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks[/url][/b]

NVIDIA will continue improving the performance with sw optimizations and feature enhancements in future releases of JetPack.
Included in the results above, we’ve also provided estimates of the future performance incorporating these improvements.

Note: data from TX1/TX2 is available for comparision here — [b][url]https://www.nvidia.com/en-us/data-center/resources/inference-technical-overview/[/url][/b]

dusty_nv · December 13, 2018, 4:54pm

We’ve posted a technical blog on the Jetson AGX Xavier architecture, including an in-depth analysis of the benchmarking results.

Check it out here — [b][url]https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/[/url][/b]

leka0024 · June 19, 2019, 6:13pm

The link to the inference benchmarks simply says to use ./trtexec (with options). This is not helpful. Unless I missed something, or perhaps it was different in previous Jetpack versions (I’m using 4.2), I had to figure out where this executable was located.

I found /usr/src/tensorrt/samples/trtexec and did “sudo make”, without first checking if there was already a /usr/src/tensorrt/bin/ . When I did the make, it complained about:

../Makefile.config:5: CUDA_INSTALL_DIR variable is not specified, using /usr/local/cuda by default, use CUDA_INSTALL_DIR=<cuda_directory> to change.
../Makefile.config:8: CUDNN_INSTALL_DIR variable is not specified, using $CUDA_INSTALL_DIR by default, use CUDNN_INSTALL_DIR=<cudnn_directory> to change.

but it finished the compile and now the executable is available in /bin. So I don’t know if it compiled correctly.

Perhaps it would be great if this was pre-compiled (with the correct links to the cuda and cuDNN install paths), and the command was already in the PATH so that “trtexec” could just be used in the terminal from anywhere, like nvpmodel and jetson_clocks are (now? they are for me in 4.2).

EDIT: well actually it looks like it doesn’t matter, because every time I try to run the ./trtexec from /usr/src/tensorrt/bin/ , it always says

Could not open file XXXX
CaffeParser: Could not parse deploy file

where I’ve tried for example “…/data/googlenet.prototxt”. I can definitely see some things in /usr/src/tensorrt/data/, but it doesn’t seem to be working.

Would be great if the inference benchmark documentation could be updated for Jetpack 4.2. Or maybe now it’s supposed to be run from the Python API?? Again, documentation …

dusty_nv · June 20, 2019, 7:28pm

The trtexec binary does come pre-compiled. It’s located in /usr/src/tensorrt/bin

For JetPack 4.2, the correct path would be “…/data/googlenet/googlenet.prototxt”. For example:

$ cd /usr/src/tensorrt/bin
$ ./trtexec --avgRuns=100 --deploy=../data/googlenet/googlenet.prototxt --fp16 --iterations=1000 --output=prob

leka0024 · June 20, 2019, 8:18pm

Thanks!

So have I screwed up the pre-compiled binary by running the makefile in samples/trtexec, because of the error messages it showed?

It would be great if the instructions on the webpage simply showed where you needed to navigate to in order to execute trtexec.

And in Jetpack 4.2, obviously the file structure changed because GoogleNet is the only .prototex file located in …/data, there isn’t anything for ResNet18 FCN, ResNet50, etc. that are shown on the web page.

Better yet, I was just trying to execute something to verify my Jetson was functioning well out of the box. Would be great if perhaps there was a “jetson_startup_test” that could be run from anywhere in the terminal, that would run pre-compiled code testing cuda, cuDNN, and tensorRT.

leka0024 · June 20, 2019, 8:37pm

OK, so at least for running the GoogleNet, how do I tell if I’m meeting the benchmark? The only output from running the command is something like:

“Average over 1 runs is 4.03xxxxxx ms (host walltime is 4.06xxxx ms, 99% percentaile time is 4.03xxxx ms”

That’s roughly the lowest the time goes regardless of parameters, if the batch is 1. Running more iterations or avgRuns just makes it take longer to finish.

Does this mean my Jetson is running 4x slower than it’s supposed to? Is this supposed to be run from a terminal before the GUI boots up?

Thanks!

dusty_nv · June 21, 2019, 8:01pm

If it didn’t compile, then the binary wouldn’t have been overwritten, so you should be fine.

To get the images per second, take 1000 divided by the time reported here. You should launch trt-exec to measure over many runs (like in the example commands). The first run is typically slow because the clock frequency governor needs time to spin up the clocks (or you can run ‘sudo jetson_clocks’ beforehand to disable to frequency governor).

The benchmarking numbers on the website are reported from concurrently running GPU (INT8) and two DLA’s (FP16). Hence those trt-exec commands can be run at the same time and the images per second added together.

leka0024 · June 21, 2019, 9:48pm

Hi dusty_nv, thanks for the replies and continued patience!

It did compile, which was the worrisome part for me. Not so much on cuda, because the fallback in the warning message was in fact correct, but for cuDNN, the libcuDNN are not installed in /usr/local/cuda, rather they’re in /usr/lib/aarch64-linux-gnu/

So I simply added the following to the bottom of my .bashrc in home:

export CUDA_INSTALL_DIR=/usr/local/cuda
export CUDNN_INSTALL_DIR=/usr/lib/aarch64-linux-gnu

I recompiled trtexec, and also compiled sampleGoogleNet. Both compiled fine, no warnings. And I used sudo jetson_clocks before running anything in bin/.

The trtexec definitely has better performance now, though oddly enough it seems to take much longer to print to terminal. I’m still just doing avgRuns=1 because I wanted it to print fast. When I set it to int8 with 10 iterations, they were averaging now more like 1.8-1.9 msec.

However, the sample_googlenet does not seem to be working correctly, even though I’m giving it the correct address to the googlenet.prototxt. Its output just says:

Building and running a GPU inference engine for GoogleNet
Ran ./sample_googlenet with 
Input(s): data 
Output(s): prob 
Done.

which is much different than what is shown at this webpage: https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#googlenet_sample

Any final suggestions?

dusty_nv · June 25, 2019, 3:03pm

Hi leka004, I get that same output with JetPack 4.2. I think the sample must have been updated for TensorRT 5.1.5, which is the version that those online docs reference. The version of TensorRT in JetPack 4.2 is TensorRT 5.0.6

leka0024 · June 25, 2019, 10:57pm

Thanks! Perhaps in JP 4.2.1 it will have TRT 5.1.5 ?

dusty_nv · June 26, 2019, 2:53pm

Yes, it actually has TRT 5.1.6.

leka0024 · June 26, 2019, 2:56pm

Great! So in order to get JP 4.2.1 on my Jetson AGXX, I will need to re-flash it from a host PC running Ubuntu 18.04 (btw, this is an unfortunate requirement, as many of my linux machines are already past that … I hope newer versions of SDKM can accept higher versions of Ubuntu). And I’ll lose anything I’ve got on there now, so I should backup any codes or programs? Thanks again!

dusty_nv · June 26, 2019, 4:49pm

That’s correct, you’ll need to back up anything you want to save. We have plans to move to OTA in-place upgrades in the future, but for now re-flashing for a new JetPack is still required.

BMohit · July 2, 2019, 7:22am

Hi dusty_nv,

When jetpack 4.2.1 is going to be released?
Can you please share any tentative timelines?
Waiting for int8-DLA support.

dusty_nv · July 3, 2019, 6:33pm

Hi BMohit, we are aiming to release it next week, stay tuned.

bajpai9 · January 16, 2020, 7:33am

Hi dusty_nv,

We got below the difference in results for below networks :-

We used Data type fp16, Input shape ( 1,3,224,224 ), torch2trt

1.) Alexnet ( our Throughput recorded 146 Vs Standard results in Nvidia website 565 )

2.) Squeezenet 1.0 ( our Throughput recorded 111 Vs Standard results in Nvidia website 121 )

3.) Squeezenet 1.1 ( our Throughput 115 recorded Vs Standard results in Nvidia website 125 )

4.) Resnet18 ( our Throughput recorded 349 Vs Standard results in Nvidia website 722 )

5.) Resnet34 ( our Throughput recorded 249 Vs Standard results in Nvidia website 396 )

6.) Resnet50 ( our Throughput recorded 227 Vs Standard results in Nvidia website 326 )

7.) Resnet101 ( our Throughput recorded 84.7 Vs Standard results in Nvidia website 175 )

8.) Restnet152 ( our Throughput recorded 122 Vs Standard results in Nvidia website 122 )

9.) Densenet121 ( our Throughput recorded 162 Vs Standard results in Nvidia website 76.6 )

Below is the model details :-

We are using Jetson AGX Xavier and Jetpack 4.2.2 and TensorRT 5.1.6

Thanks,

dusty_nv · January 23, 2020, 2:45pm

Hi bajpai9, were you running your Xavier in MAX-N mode (sudo nvpmodel -m 0) and had you also run the jetson_clocks script beforehand?

I would open an issue on the torch2trt Issues tab so the maintainer of that project can respond, thanks. This thread is about the official benchmarks.

Topic		Replies	Views
Unable to verify Xavier inference benchmarks Jetson AGX Xavier	17	2279	October 18, 2021
Jetson AGX HDMI Display is not working after applying RT Patch Jetson AGX Xavier hdmi , preempt_rt	26	2248	January 27, 2021
Preempt-RT patch on Jetson kernel Jetson AGX Xavier preempt_rt	35	8355	August 23, 2023
[E] Error[1]: [genericReformat.cu::executeMemcpy::1334] Error Code 1: Cuda Runtime (invalid argument) Jetson AGX Xavier tensorrt	10	1234	September 19, 2022
Jetson AGX Xavier DDR Test Jetson AGX Xavier performance	16	1728	October 18, 2021
The jetson_benchmarks from github can not run on Xavier NX 16G SoM Jetson Xavier NX jetson-inference	25	1347	May 31, 2023
JetPack 4.1.1 Developer Preview for Jetson AGX Xavier Jetson AGX Xavier	10	2487	December 6, 2018
TensorRT Quick Start Guide Example is not running (JetPack 4.2.2) Jetson AGX Xavier tensorrt , onnx	6	899	January 5, 2022
Performance issues when upgrading to JetPack 5 Jetson Xavier NX jetpack , performance	12	235	October 23, 2024
New Jetson software, modules and pricing Announcements	21	6035	July 24, 2019

Jetson AGX Xavier Deep Learning Inference Benchmarks

Related topics