Jetson AGX Xavier INT8 Performance

mgysun · January 28, 2019, 5:36pm

Hi, I’m running inference on a CV image detection network on Xavier in INT8 on batch size 1. I’m converting from an Onnx model to TensorRT using the sample function provided. When I ran inference through nvprof, I saw around the same range of performance between the FP16 and INT8 versions, and I also noticed an incredibly high number of memcpy calls in the INT8 version (but same total times.) INT8 is supported by Xavier, but I don’t see any speedup? Using TRT 5.0.2.6, cuda10.

Clarification appreciated, thank you.

AastaLLL · January 29, 2019, 2:48am

Hi,

Have you maximized the CPU/GPU clocks first?

sudo ./jetson_clocks.sh

Could you share which model do you test with us?
Here is our benchmark result for the Jetson Xavier:
https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks

You can check if you have the similar result with ours first.
Thanks.

mysun · January 30, 2019, 6:20pm

I have maximized clocks.

The benchmarks results for resnet-50 using caffe models and bs=1 are fine, I get ~2ms for int8 and ~3ms for fp16. It says there is no onnx model support for int8.

I cannot share the model right now, but I went and checked the gpu trace and it seems like I’m getting a lot of tensor conversions (cuInt8::nchwToNcqhw4 and vice versa). About four times as many on int8 as on fp16. Is there a function that forces the conversion for int8? Each conversion takes about as long as the computation.

I see the conversion between some cudnn calls (trt_volta_int8x4_icudnn_int8x4) and this (ZN6thrust8cuda_cub4core13_kernel_agentINS0_14__parallel_for16ParallelForAgentINS0_11__transform17unary_transform_fIPKfPfNS5_14no_stencil_tagEZN21FancyActivationPlugin9doEnqueueIfEEiiPKPKvPPvSH_P11CUstream_stEUlfE_NS5_21always_true_predicateEEElEESN_lEEvT0_T1), which seems like a input enqueue? Could I solve the problem by restructuring my data inputs?

Thanks for the prompt response.

AastaLLL · February 12, 2019, 7:03am

Hi,

For Caffe frameworks, you can feed the caffemodel into TensorRT directly.
It’s no required to convert it into onnx model.

Please try to convert the TensorRT engine from caffemodel first. It may give you a better performance.
Thanks.

Topic		Replies	Views
TensorRT int8 performance Jetson AGX Xavier	4	1370	October 18, 2021
Converting Caffe to TensorRT using int8 Jetson TX2	4	850	October 18, 2021
Pytorch network -> onnx -> tensorrt performance(run frequency) question Jetson AGX Xavier tensorrt	7	568	December 7, 2023
Int8 is not faster than fp16 on xavier Jetson AGX Xavier tensorrt	5	853	October 18, 2021
Can't get INT8 precision to work Jetson Xavier NX tensorrt , python , containers , ngc	4	875	September 23, 2022
NX & TRT & Jetson-inference - Not setting precision to INT8 Jetson Xavier NX tensorrt , jetson-inference	4	954	October 18, 2021
Yolov3 int8 on tensorrt 7.1.0.16 Jetson Xavier NX tensorrt	4	931	October 18, 2021
TensorRT INT8 conversion lack of performance increase. Jetson Nano	2	814	October 15, 2021
Inferencing on AGX Xavier in INT8 mode Jetson AGX Xavier jetson-inference	3	1122	December 8, 2021
Failed to use INT8 precision mode when using tf-trt on Xavier Jetson AGX Xavier	4	1050	October 18, 2021

Jetson AGX Xavier INT8 Performance

Related topics