Yolov3 FPS on TensorRT

qmara781128 · January 6, 2020, 6:44am

Hi,

In this link: https://devblogs.nvidia.com/jetson-xavier-nx-the-worlds-smallest-ai-supercomputer/, report inferencing FPS is close to 100 FPS of YOLO-V3(608x608) on AGX Xavier with TensorRT.(Figure 3)

We follow /usr/src/tensorrt/samples/python/yolov3_onnx code, but we try YOLO-V3(608x608) on AGX Xavier with TensorRT, there’s only 9 FPS, and we try TensorRT-INT8, there’s only 32 FPS.

Why we can’t reach this high FPS?

The environment is as follows:
OS is ubuntu 18.04
TensorRT is 6.0.1

power mode:

nvpmodel -q
NV Fan Mode:quiet
NV Power Mode: MAXN

please help.

AastaLLL · January 7, 2020, 2:39am

Hi,

Have you maximized the device clocks?

sudo jetson_clocks

Thanks.

qmara781128 · January 7, 2020, 3:32am

Hi,

I have try maximized the device clocks, but FPS is not improve.

gaylord · January 7, 2020, 12:01pm

wow, @AastaLLL, this was not helpful…

We have the same Problem here. Only 9 FPS with YOLO v3 416x416, more like 6 FPS with 608x608.

AastaLLL · January 8, 2020, 9:06am

Hi,

I’m checking this issue with our internal team.
Will update more information later.

To benchmark, it’s recommended to use trtexect.

/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=[MAX] --int8

Thanks.

AastaLLL · January 9, 2020, 6:13am

Hi,

We got the feedback from our internal team.

Xavier is using GPU+2DLA to achieve the throughput.
GPU latency is 26ms and DLA latency is 32.5ms.
Total throughput is 100fps.

Thanks.

gaylord · January 9, 2020, 6:46am

Thanks for the info. Seems like the GPU alone should be able to do 30 fps then. Can you point me to an example how to run yolo at that speed? Because as i daid: for us it is 6 fps.

qmara781128 · January 10, 2020, 1:38am

Thanks for the info, too.
In our implemsntation, YOLOv3 (COCO database object detection, 608*608) costs 102ms in darknet(float point), and 110ms in TensorRT(float point), 29.3ms in TensorRT(int8) for one image.

It seems like your claims “GPU latency is 26ms” is implemented by TensorRT(int8). Is it right?
Total throughput is 100fps.
Is it a parallel computing by GPU and 2DLA?
If yes, the FPS is estimated by the maximum implemented time {GPU, DLA1, DLA2}. FPS=1000/(32.5/3)~=92FPS (almost 100FPS, bias is GPUlatency probability)
Is it?
Can you also point me to an example/tutorial how to run the parallel computing by GPU and 2DLA ?
Thanks.

AastaLLL · January 10, 2020, 2:37am

Sure.

1. Prepare Xavier for the JetPack4.3 and maximize the performance with:

sudo nvpmodel -m 0
sudo jetson_clocks

2. Generate yolov3.onnx with the README shared in the /usr/src/tensorrt/samples/python/yolov3_onnx/.

3. Run the yolov3.onnx with trtexec, which targets for the profiling.

/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --int8

Thanks.

qmara781128 · January 10, 2020, 8:07am

Hi AastaLLL,

We try to run trtexec with GPU, commend if follow as:

trtexec --onnx=yolov3_608.onnx --workspace=26 --int8

and result infomation is:

&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --int8
[00/10/2020-15:36:16] [I] === Model Options ===
[00/10/2020-15:36:16] [I] Format: ONNX
[00/10/2020-15:36:16] [I] Model: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
[00/10/2020-15:36:16] [I] Output:
[00/10/2020-15:36:16] [I] === Build Options ===
[00/10/2020-15:36:16] [I] Max batch: 1
[00/10/2020-15:36:16] [I] Workspace: 26 MB
[00/10/2020-15:36:16] [I] minTiming: 1
[00/10/2020-15:36:16] [I] avgTiming: 8
[00/10/2020-15:36:16] [I] Precision: INT8
[00/10/2020-15:36:16] [I] Calibration: Dynamic
[00/10/2020-15:36:16] [I] Safe mode: Disabled
[00/10/2020-15:36:16] [I] Save engine: 
[00/10/2020-15:36:16] [I] Load engine: 
[00/10/2020-15:36:16] [I] Inputs format: fp32:CHW
[00/10/2020-15:36:16] [I] Outputs format: fp32:CHW
[00/10/2020-15:36:16] [I] Input build shapes: model
[00/10/2020-15:36:16] [I] === System Options ===
[00/10/2020-15:36:16] [I] Device: 0
[00/10/2020-15:36:16] [I] DLACore: 
[00/10/2020-15:36:16] [I] Plugins:
[00/10/2020-15:36:16] [I] === Inference Options ===
[00/10/2020-15:36:16] [I] Batch: 1
[00/10/2020-15:36:16] [I] Iterations: 10 (200 ms warm up)
[00/10/2020-15:36:16] [I] Duration: 10s
[00/10/2020-15:36:16] [I] Sleep time: 0ms
[00/10/2020-15:36:16] [I] Streams: 1
[00/10/2020-15:36:16] [I] Spin-wait: Disabled
[00/10/2020-15:36:16] [I] Multithreading: Enabled
[00/10/2020-15:36:16] [I] CUDA Graph: Disabled
[00/10/2020-15:36:16] [I] Skip inference: Disabled
[00/10/2020-15:36:16] [I] Input inference shapes: model
[00/10/2020-15:36:16] [I] === Reporting Options ===
[00/10/2020-15:36:16] [I] Verbose: Disabled
[00/10/2020-15:36:16] [I] Averages: 10 inferences
[00/10/2020-15:36:16] [I] Percentile: 99
[00/10/2020-15:36:16] [I] Dump output: Disabled
[00/10/2020-15:36:16] [I] Profile: Disabled
[00/10/2020-15:36:16] [I] Export timing to JSON file: 
[00/10/2020-15:36:16] [I] Export profile to JSON file: 
[00/10/2020-15:36:16] [I] 
----------------------------------------------------------------
Input filename:   /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    NVIDIA TensorRT sample
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[00/10/2020-15:36:18] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[00/10/2020-15:36:18] [I] [TRT] 
[00/10/2020-15:36:18] [I] [TRT] --------------- Layers running on DLA: 
[00/10/2020-15:36:18] [I] [TRT] 
[00/10/2020-15:36:18] [I] [TRT] --------------- Layers running on GPU: 
[00/10/2020-15:36:18] [I] [TRT] (Unnamed Layer* 0) [Convolution], (Unnamed Layer* 2) [Activation], (Unnamed Layer* 3) [Convolution], (Unnamed Layer* 5) [Activation], (Unnamed Layer* 6) [Convolution], (Unnamed Layer* 8) [Activation], (Unnamed Layer* 9) [Convolution], (Unnamed Layer* 11) [Activation], (Unnamed Layer* 12) [ElementWise], (Unnamed Layer* 13) [Convolution], (Unnamed Layer* 15) [Activation], (Unnamed Layer* 16) [Convolution], (Unnamed Layer* 18) [Activation], (Unnamed Layer* 19) [Convolution], (Unnamed Layer* 21) [Activation], (Unnamed Layer* 22) [ElementWise], (Unnamed Layer* 23) [Convolution], (Unnamed Layer* 25) [Activation], (Unnamed Layer* 26) [Convolution], (Unnamed Layer* 28) [Activation], (Unnamed Layer* 29) [ElementWise], (Unnamed Layer* 30) [Convolution], (Unnamed Layer* 32) [Activation], (Unnamed Layer* 33) [Convolution], (Unnamed Layer* 35) [Activation], (Unnamed Layer* 36) [Convolution], (Unnamed Layer* 38) [Activation], (Unnamed Layer* 39) [ElementWise], (Unnamed Layer* 40) [Convolution], (Unnamed Layer* 42) [Activation], (Unnamed Layer* 43) [Convolution], (Unnamed Layer* 45) [Activation], (Unnamed Layer* 46) [ElementWise], (Unnamed Layer* 47) [Convolution], (Unnamed Layer* 49) [Activation], (Unnamed Layer* 50) [Convolution], (Unnamed Layer* 52) [Activation], (Unnamed Layer* 53) [ElementWise], (Unnamed Layer* 54) [Convolution], (Unnamed Layer* 56) [Activation], (Unnamed Layer* 57) [Convolution], (Unnamed Layer* 59) [Activation], (Unnamed Layer* 60) [ElementWise], (Unnamed Layer* 61) [Convolution], (Unnamed Layer* 63) [Activation], (Unnamed Layer* 64) [Convolution], (Unnamed Layer* 66) [Activation], (Unnamed Layer* 67) [ElementWise], (Unnamed Layer* 68) [Convolution], (Unnamed Layer* 70) [Activation], (Unnamed Layer* 71) [Convolution], (Unnamed Layer* 73) [Activation], (Unnamed Layer* 74) [ElementWise], (Unnamed Layer* 75) [Convolution], (Unnamed Layer* 77) [Activation], (Unnamed Layer* 78) [Convolution], (Unnamed Layer* 80) [Activation], (Unnamed Layer* 81) [ElementWise], (Unnamed Layer* 82) [Convolution], (Unnamed Layer* 84) [Activation], (Unnamed Layer* 85) [Convolution], (Unnamed Layer* 87) [Activation], (Unnamed Layer* 88) [ElementWise], (Unnamed Layer* 89) [Convolution], (Unnamed Layer* 91) [Activation], (Unnamed Layer* 92) [Convolution], (Unnamed Layer* 94) [Activation], (Unnamed Layer* 95) [Convolution], (Unnamed Layer* 97) [Activation], (Unnamed Layer* 98) [ElementWise], (Unnamed Layer* 99) [Convolution], (Unnamed Layer* 101) [Activation], (Unnamed Layer* 102) [Convolution], (Unnamed Layer* 104) [Activation], (Unnamed Layer* 105) [ElementWise], (Unnamed Layer* 106) [Convolution], (Unnamed Layer* 108) [Activation], (Unnamed Layer* 109) [Convolution], (Unnamed Layer* 111) [Activation], (Unnamed Layer* 112) [ElementWise], (Unnamed Layer* 113) [Convolution], (Unnamed Layer* 115) [Activation], (Unnamed Layer* 116) [Convolution], (Unnamed Layer* 118) [Activation], (Unnamed Layer* 119) [ElementWise], (Unnamed Layer* 120) [Convolution], (Unnamed Layer* 122) [Activation], (Unnamed Layer* 123) [Convolution], (Unnamed Layer* 125) [Activation], (Unnamed Layer* 126) [ElementWise], (Unnamed Layer* 127) [Convolution], (Unnamed Layer* 129) [Activation], (Unnamed Layer* 130) [Convolution], (Unnamed Layer* 132) [Activation], (Unnamed Layer* 133) [ElementWise], (Unnamed Layer* 134) [Convolution], (Unnamed Layer* 136) [Activation], (Unnamed Layer* 137) [Convolution], (Unnamed Layer* 139) [Activation], (Unnamed Layer* 140) [ElementWise], (Unnamed Layer* 141) [Convolution], (Unnamed Layer* 143) [Activation], (Unnamed Layer* 144) [Convolution], (Unnamed Layer* 146) [Activation], (Unnamed Layer* 147) [ElementWise], (Unnamed Layer* 148) [Convolution], (Unnamed Layer* 150) [Activation], (Unnamed Layer* 151) [Convolution], (Unnamed Layer* 153) [Activation], (Unnamed Layer* 154) [Convolution], (Unnamed Layer* 156) [Activation], (Unnamed Layer* 157) [ElementWise], (Unnamed Layer* 158) [Convolution], (Unnamed Layer* 160) [Activation], (Unnamed Layer* 161) [Convolution], (Unnamed Layer* 163) [Activation], (Unnamed Layer* 164) [ElementWise], (Unnamed Layer* 165) [Convolution], (Unnamed Layer* 167) [Activation], (Unnamed Layer* 168) [Convolution], (Unnamed Layer* 170) [Activation], (Unnamed Layer* 171) [ElementWise], (Unnamed Layer* 172) [Convolution], (Unnamed Layer* 174) [Activation], (Unnamed Layer* 175) [Convolution], (Unnamed Layer* 177) [Activation], (Unnamed Layer* 178) [ElementWise], (Unnamed Layer* 179) [Convolution], (Unnamed Layer* 181) [Activation], (Unnamed Layer* 182) [Convolution], (Unnamed Layer* 184) [Activation], (Unnamed Layer* 185) [Convolution], (Unnamed Layer* 187) [Activation], (Unnamed Layer* 188) [Convolution], (Unnamed Layer* 190) [Activation], (Unnamed Layer* 191) [Convolution], (Unnamed Layer* 193) [Activation], (Unnamed Layer* 194) [Convolution], (Unnamed Layer* 196) [Activation], (Unnamed Layer* 197) [Convolution], (Unnamed Layer* 198) [Convolution], (Unnamed Layer* 200) [Activation], (Unnamed Layer* 201) [Resize], 086_upsample copy, (Unnamed Layer* 203) [Convolution], (Unnamed Layer* 205) [Activation], (Unnamed Layer* 206) [Convolution], (Unnamed Layer* 208) [Activation], (Unnamed Layer* 209) [Convolution], (Unnamed Layer* 211) [Activation], (Unnamed Layer* 212) [Convolution], (Unnamed Layer* 214) [Activation], (Unnamed Layer* 215) [Convolution], (Unnamed Layer* 217) [Activation], (Unnamed Layer* 218) [Convolution], (Unnamed Layer* 220) [Activation], (Unnamed Layer* 221) [Convolution], (Unnamed Layer* 222) [Convolution], (Unnamed Layer* 224) [Activation], (Unnamed Layer* 225) [Resize], 098_upsample copy, (Unnamed Layer* 227) [Convolution], (Unnamed Layer* 229) [Activation], (Unnamed Layer* 230) [Convolution], (Unnamed Layer* 232) [Activation], (Unnamed Layer* 233) [Convolution], (Unnamed Layer* 235) [Activation], (Unnamed Layer* 236) [Convolution], (Unnamed Layer* 238) [Activation], (Unnamed Layer* 239) [Convolution], (Unnamed Layer* 241) [Activation], (Unnamed Layer* 242) [Convolution], (Unnamed Layer* 244) [Activation], (Unnamed Layer* 245) [Convolution], 
[00/10/2020-15:36:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[00/10/2020-15:39:46] [I] [TRT] Detected 1 inputs and 3 output network tensors.
[00/10/2020-15:39:47] [I] Average over 10 runs is 25.7592 ms (host walltime is 25.8521 ms, 99% percentile time is 26.4878).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.8782 ms (host walltime is 27.0092 ms, 99% percentile time is 33.6098).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.711 ms (host walltime is 26.8539 ms, 99% percentile time is 33.9163).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.6445 ms (host walltime is 26.7936 ms, 99% percentile time is 32.8204).
[00/10/2020-15:39:48] [I] Average over 10 runs is 26.6838 ms (host walltime is 26.8311 ms, 99% percentile time is 33.1121).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.7551 ms (host walltime is 26.8875 ms, 99% percentile time is 33.6251).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.9908 ms (host walltime is 27.1109 ms, 99% percentile time is 32.6678).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.7439 ms (host walltime is 26.8641 ms, 99% percentile time is 33.1622).
[00/10/2020-15:39:49] [I] Average over 10 runs is 26.9623 ms (host walltime is 27.0811 ms, 99% percentile time is 33.2608).
[00/10/2020-15:39:50] [I] Average over 10 runs is 26.7927 ms (host walltime is 26.8805 ms, 99% percentile time is 33.184).
&&&& PASSED TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --int8

And we try to run trtexec with DLA, commend if follow as:

trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8

&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8
[00/10/2020-16:04:24] [I] === Model Options ===
[00/10/2020-16:04:24] [I] Format: ONNX
[00/10/2020-16:04:24] [I] Model: /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
[00/10/2020-16:04:24] [I] Output:
[00/10/2020-16:04:24] [I] === Build Options ===
[00/10/2020-16:04:24] [I] Max batch: 1
[00/10/2020-16:04:24] [I] Workspace: 26 MB
[00/10/2020-16:04:24] [I] minTiming: 1
[00/10/2020-16:04:24] [I] avgTiming: 8
[00/10/2020-16:04:24] [I] Precision: INT8
[00/10/2020-16:04:24] [I] Calibration: Dynamic
[00/10/2020-16:04:24] [I] Safe mode: Disabled
[00/10/2020-16:04:24] [I] Save engine: 
[00/10/2020-16:04:24] [I] Load engine: 
[00/10/2020-16:04:24] [I] Inputs format: fp32:CHW
[00/10/2020-16:04:24] [I] Outputs format: fp32:CHW
[00/10/2020-16:04:24] [I] Input build shapes: model
[00/10/2020-16:04:24] [I] === System Options ===
[00/10/2020-16:04:24] [I] Device: 0
[00/10/2020-16:04:24] [I] DLACore: 1
[00/10/2020-16:04:24] [I] Plugins:
[00/10/2020-16:04:24] [I] === Inference Options ===
[00/10/2020-16:04:24] [I] Batch: 1
[00/10/2020-16:04:24] [I] Iterations: 10 (200 ms warm up)
[00/10/2020-16:04:24] [I] Duration: 10s
[00/10/2020-16:04:24] [I] Sleep time: 0ms
[00/10/2020-16:04:24] [I] Streams: 1
[00/10/2020-16:04:24] [I] Spin-wait: Disabled
[00/10/2020-16:04:24] [I] Multithreading: Enabled
[00/10/2020-16:04:24] [I] CUDA Graph: Disabled
[00/10/2020-16:04:24] [I] Skip inference: Disabled
[00/10/2020-16:04:24] [I] Input inference shapes: model
[00/10/2020-16:04:24] [I] === Reporting Options ===
[00/10/2020-16:04:24] [I] Verbose: Disabled
[00/10/2020-16:04:24] [I] Averages: 10 inferences
[00/10/2020-16:04:24] [I] Percentile: 99
[00/10/2020-16:04:24] [I] Dump output: Disabled
[00/10/2020-16:04:24] [I] Profile: Disabled
[00/10/2020-16:04:24] [I] Export timing to JSON file: 
[00/10/2020-16:04:24] [I] Export profile to JSON file: 
[00/10/2020-16:04:24] [I] 
----------------------------------------------------------------
Input filename:   /home/nvidia/work/yolov3_onnx/yolov3_608.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    NVIDIA TensorRT sample
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[00/10/2020-16:04:26] [E] [TRT] (Unnamed Layer* 2) [Activation]: ActivationLayer (with ActivationType = LEAKY_RELU) not supported for DLA.
[00/10/2020-16:04:26] [E] [TRT] Default DLA is enabled but layer (Unnamed Layer* 2) [Activation] is not supported on DLA and falling back to GPU is not enabled.
[00/10/2020-16:04:26] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/nvidia/work/yolov3_onnx/yolov3_608.onnx --workspace=26 --useDLACore=1 --int8

Some layer is not supported on DLA, sush as LEAKY_RELU, but this document report have support LEAKY_RELU on link: https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html

How solve this issue?

gaylord · January 10, 2020, 8:28am

…

AastaLLL · January 13, 2020, 2:24am

Hi

The page you shard is TensorRT support matrix.
DLA support matrix is here:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#dla_layers

Activation Layer
    Functions supported: ReLU, Sigmoid, Hyperbolic Tangent
    <b>Negative slope not supported for ReLU</b>
    Only ReLU operation is supported in INT8

To solve this, try to enable –allowGPUFallback when executing.

[Activation] is not supported on DLA and falling back to GPU is not enabled.

Thanks.

qmara781128 · January 13, 2020, 3:46am

Thanks for the info.

I have run trtexec with enable --allowGPUFallback, and speed report 33ms.

About your last reply:

Xavier is using GPU+2DLA to achieve the throughput.
GPU latency is 26ms and DLA latency is 32.5ms.
Total throughput is 100fps

So, 32.5ms is for DLA+GPU (is not only DLA)? If yes, why Xavier can make the speed reach 100fps.

I am not sure if my idea is correct. Because I think it should be…

DLA0 + GPU = 32.5ms
DLA1 + GPU = 32.5ms
1 / (32.5ms) * 2 = 61.5fps

Can the GPU still be used synchronously?

AastaLLL · January 14, 2020, 2:09am

Hi,

You miss another one to run purely on GPU.
Try this command at the same time:

/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --int8

Thanks.

qmara781128 · January 22, 2020, 1:01am

Sorry i’m late.

I have tried this commend, but this commend only executes GPU without DLA.

I hope to be able to synchronous execute two DLA and one GPU as you described, for a speed of 100FPS.

Do you have example about synchronous execute two DLA and one GPU?

AastaLLL · February 10, 2020, 6:40am

Hi,

Only one target is allowed to deploy an TensorRT engine.
So you will need to run them separately to reach 100FPS throughput.

In short, try following commands in different console simultaneously.

/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --int8
/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --useDLACore=0 --int8
/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --useDLACore=1 --int8

However, please noticed that this benchmark targets for throughput.
There is no synchronous mechanism among the process.

Thanks.

qmara781128 · February 18, 2020, 9:54am

Sorry I’m late,

We have try commands in different console simultaneously such as following figure:

If only run GPU case, FPS=35, but when we continuously run GPU → DLA0 -DLA1, the speed is slow down.

The FPS curve following figure:

We also tried the original execution “trtexec”, and the same problem.

We still can’t get 100FPS when use two DLA and GPU.

Where is the problem?

trickititit · March 3, 2020, 11:05am

Up

AastaLLL · March 4, 2020, 8:41am

Hi,

Sorry for the late update. There is some missing information of this topic.

Please noticed that there are some threading/clock issue on the trtexec in TRT6.0.
This will require you to enable the spinwait flag to get a better performance.

/usr/src/tensorrt/bin/trtexec --onnx=./yolov3.onnx --workspace=26 --int8 <b>--useSpinWait</b>

For benchmark, please also maximize the device performance first:

sudo nvpmodel -m 0
sudo jetson_clocks

And you can also play around the workspace size to get a best throughput on YOLOv3.

Thanks.

qmara781128 · March 6, 2020, 8:13am

Hi AastaLLL and eveyone,

We have tried the above command parameter, and result in figuer

It looks the same of FPS issue.

We newly created new topic on this link https://devtalk.nvidia.com/default/topic/1072834/jetson-agx-xavier/how-to-use-gpu-2-dla-can-be-100fps-for-yolov3-on-xavier/, move the issue here, please follow new topic.

Thank you.

Topic		Replies	Views
How to use GPU + 2 DLA can be 100FPS for YoloV3 on Xavier Jetson AGX Xavier nvbugs , performance , yolo	6	4004	October 22, 2020
Unable to verify Xavier inference benchmarks Jetson AGX Xavier	16	2581	April 6, 2021
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3560	November 8, 2018
How can i get 1000 FPS by running the inference with TensorRT Tiny-YOLOv3 (Jetson AGX Xavier) Jetson AGX Xavier tensorrt , yolo , onnx , fps	5	3144	June 8, 2021
Deep Learning Accelerator problems DRIVE AGX Xavier General	1	1522	July 2, 2019
slower when change DefaultDeviceType from GPU to DLA? Jetson AGX Xavier	2	744	March 25, 2019
Trtexec log problem and use DLA error on Jetson Xavier Jetson AGX Xavier dla	6	1710	March 23, 2021
Trt model inference difference Jetson AGX Xavier jetson-inference , cudnn	3	158	November 5, 2024
TensorRT 5 docs and examples (Solved) Jetson AGX Xavier	15	7338	July 4, 2019
YOLOV4 inference on DLA Jetson Xavier NX dla	4	1022	August 4, 2021

Yolov3 FPS on TensorRT

Related topics