Cannot build a TensorRT engine for DLA from a large ONNX file

Hello guys,
I tried to build a TensorRT engine for DLA from the following ONNX model with trtexec but it failed. Is there a solution?

Model:
https://digital-standard.com/threedpose/models/Resnet34_3inputs_448x448_20200609.onnx

Machine:
Jetson Xavier NX DevKit with JetPack 4.4

Command:

/usr/src/tensorrt/bin/trtexec --onnx=Resnet34_3inputs_448x448_20200609.onnx --explicitBatch --batch=1 --workspace=4096 --saveEngine=test.engine --fp16 --useDLACore=0 --allowGPUFallback

Error:

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=Resnet34_3inputs_448x448_20200609.onnx --explicitBatch --batch=1 --workspace=4096 --saveEngine=test.engine --fp16 --useDLACore=0 --allowGPUFallback
[07/31/2020-12:31:08] [I] === Model Options ===
[07/31/2020-12:31:08] [I] Format: ONNX
[07/31/2020-12:31:08] [I] Model: Resnet34_3inputs_448x448_20200609.onnx
[07/31/2020-12:31:08] [I] Output:
[07/31/2020-12:31:08] [I] === Build Options ===
[07/31/2020-12:31:08] [I] Max batch: explicit
[07/31/2020-12:31:08] [I] Workspace: 4096 MB
[07/31/2020-12:31:08] [I] minTiming: 1
[07/31/2020-12:31:08] [I] avgTiming: 8
[07/31/2020-12:31:08] [I] Precision: FP32+FP16
[07/31/2020-12:31:08] [I] Calibration:
[07/31/2020-12:31:08] [I] Safe mode: Disabled
[07/31/2020-12:31:08] [I] Save engine: test.engine
[07/31/2020-12:31:08] [I] Load engine:
[07/31/2020-12:31:08] [I] Builder Cache: Enabled
[07/31/2020-12:31:08] [I] NVTX verbosity: 0
[07/31/2020-12:31:08] [I] Inputs format: fp32:CHW
[07/31/2020-12:31:08] [I] Outputs format: fp32:CHW
[07/31/2020-12:31:08] [I] Input build shapes: model
[07/31/2020-12:31:08] [I] Input calibration shapes: model
[07/31/2020-12:31:08] [I] === System Options ===
[07/31/2020-12:31:08] [I] Device: 0
[07/31/2020-12:31:08] [I] DLACore: 0(With GPU fallback)
[07/31/2020-12:31:08] [I] Plugins:
[07/31/2020-12:31:08] [I] === Inference Options ===
[07/31/2020-12:31:08] [I] Batch: Explicit
[07/31/2020-12:31:08] [I] Input inference shapes: model
[07/31/2020-12:31:08] [I] Iterations: 10
[07/31/2020-12:31:08] [I] Duration: 3s (+ 200ms warm up)
[07/31/2020-12:31:08] [I] Sleep time: 0ms
[07/31/2020-12:31:08] [I] Streams: 1
[07/31/2020-12:31:08] [I] ExposeDMA: Disabled
[07/31/2020-12:31:08] [I] Spin-wait: Disabled
[07/31/2020-12:31:08] [I] Multithreading: Disabled
[07/31/2020-12:31:08] [I] CUDA Graph: Disabled
[07/31/2020-12:31:08] [I] Skip inference: Disabled
[07/31/2020-12:31:08] [I] Inputs:
[07/31/2020-12:31:08] [I] === Reporting Options ===
[07/31/2020-12:31:08] [I] Verbose: Disabled
[07/31/2020-12:31:08] [I] Averages: 10 inferences
[07/31/2020-12:31:08] [I] Percentile: 99
[07/31/2020-12:31:08] [I] Dump output: Disabled
[07/31/2020-12:31:08] [I] Profile: Disabled
[07/31/2020-12:31:08] [I] Export timing to JSON file:
[07/31/2020-12:31:08] [I] Export output to JSON file:
[07/31/2020-12:31:08] [I] Export profile to JSON file:
[07/31/2020-12:31:08] [I]
----------------------------------------------------------------
Input filename:   Resnet34_3inputs_448x448_20200609.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[07/31/2020-12:31:11] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 154) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 154) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 160) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 160) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 165) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 165) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 171) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 171) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 177) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 177) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:13] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 186) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:14] [W] [TRT] Internal DLA error for layer (Unnamed Layer* 186) [Convolution]. Switching to GPU fallback.
[07/31/2020-12:31:14] [I] [TRT]
[07/31/2020-12:31:14] [I] [TRT] --------------- Layers running on DLA:
[07/31/2020-12:31:14] [I] [TRT] {(Unnamed Layer* 0) [Convolution],(Unnamed Layer* 1) [Scale],(Unnamed Layer* 2) [Activation],(Unnamed Layer* 3) [Pooling],(Unnamed Layer* 4) [Convolution],(Unnamed Layer* 5) [Scale],(Unnamed Layer* 6) [Activation],(Unnamed Layer* 7) [Pooling],(Unnamed Layer* 8) [Convolution],(Unnamed Layer* 9) [Scale],(Unnamed Layer* 10) [Activation],(Unnamed Layer* 11) [Pooling],(Unnamed Layer* 12) [ElementWise],(Unnamed Layer* 13) [Concatenation],(Unnamed Layer* 14) [Convolution],(Unnamed Layer* 15) [Scale],(Unnamed Layer* 16) [Activation],(Unnamed Layer* 17) [Convolution],(Unnamed Layer* 18) [Scale],(Unnamed Layer* 19) [Activation],(Unnamed Layer* 20) [Convolution],(Unnamed Layer* 21) [Scale],(Unnamed Layer* 22) [Activation],(Unnamed Layer* 23) [Convolution],(Unnamed Layer* 24) [Scale],(Unnamed Layer* 25) [ElementWise],(Unnamed Layer* 26) [Activation],(Unnamed Layer* 27) [Convolution],(Unnamed Layer* 28) [Scale],(Unnamed Layer* 29) [Activation],(Unnamed Layer* 30) [Convolution],(Unnamed Layer* 31) [Scale],(Unnamed Layer* 32) [ElementWise],(Unnamed Layer* 33) [Activation],(Unnamed Layer* 34) [Convolution],(Unnamed Layer* 35) [Scale],(Unnamed Layer* 36) [Activation],(Unnamed Layer* 37) [Convolution],(Unnamed Layer* 38) [Scale],(Unnamed Layer* 39) [ElementWise],(Unnamed Layer* 40) [Activation],(Unnamed Layer* 41) [ElementWise],(Unnamed Layer* 42) [Convolution],(Unnamed Layer* 43) [Scale],(Unnamed Layer* 44) [Activation],(Unnamed Layer* 45) [Convolution],(Unnamed Layer* 46) [Scale],(Unnamed Layer* 47) [Convolution],(Unnamed Layer* 48) [Scale],(Unnamed Layer* 49) [ElementWise],(Unnamed Layer* 50) [Activation],(Unnamed Layer* 51) [Convolution],(Unnamed Layer* 52) [Scale],(Unnamed Layer* 53) [Activation],(Unnamed Layer* 54) [Convolution],(Unnamed Layer* 55) [Scale],(Unnamed Layer* 56) [ElementWise],(Unnamed Layer* 57) [Activation],(Unnamed Layer* 58) [Convolution],(Unnamed Layer* 59) [Scale],(Unnamed Layer* 60) [Activation],(Unnamed Layer* 61) [Convolution],(Unnamed Layer* 62) [Scale],(Unnamed Layer* 63) [ElementWise],(Unnamed Layer* 64) [Activation],(Unnamed Layer* 65) [Convolution],(Unnamed Layer* 66) [Scale],(Unnamed Layer* 67) [Activation],(Unnamed Layer* 68) [Convolution],(Unnamed Layer* 69) [Scale],(Unnamed Layer* 70) [ElementWise],(Unnamed Layer* 71) [Activation],(Unnamed Layer* 72) [Convolution],(Unnamed Layer* 73) [Scale],(Unnamed Layer* 74) [Activation],(Unnamed Layer* 75) [Convolution],(Unnamed Layer* 76) [Scale],(Unnamed Layer* 77) [Convolution],(Unnamed Layer* 78) [Scale],(Unnamed Layer* 79) [ElementWise],(Unnamed Layer* 80) [Activation],(Unnamed Layer* 81) [Convolution],(Unnamed Layer* 82) [Scale],(Unnamed Layer* 83) [Activation],(Unnamed Layer* 84) [Convolution],(Unnamed Layer* 85) [Scale],(Unnamed Layer* 86) [ElementWise],(Unnamed Layer* 87) [Activation],(Unnamed Layer* 88) [Convolution],(Unnamed Layer* 89) [Scale],(Unnamed Layer* 90) [Activation],(Unnamed Layer* 91) [Convolution],(Unnamed Layer* 92) [Scale],(Unnamed Layer* 93) [ElementWise],(Unnamed Layer* 94) [Activation],(Unnamed Layer* 95) [Convolution],(Unnamed Layer* 96) [Scale],(Unnamed Layer* 97) [Activation],(Unnamed Layer* 98) [Convolution],(Unnamed Layer* 99) [Scale],(Unnamed Layer* 100) [ElementWise],(Unnamed Layer* 101) [Activation],(Unnamed Layer* 102) [Convolution],(Unnamed Layer* 103) [Scale],(Unnamed Layer* 104) [Activation],(Unnamed Layer* 105) [Convolution],(Unnamed Layer* 106) [Scale],(Unnamed Layer* 107) [ElementWise],(Unnamed Layer* 108) [Activation],(Unnamed Layer* 109) [Convolution],(Unnamed Layer* 110) [Scale],(Unnamed Layer* 111) [Activation],(Unnamed Layer* 112) [Convolution],(Unnamed Layer* 113) [Scale],(Unnamed Layer* 114) [ElementWise],(Unnamed Layer* 115) [Activation],(Unnamed Layer* 116) [Convolution],(Unnamed Layer* 117) [Scale],(Unnamed Layer* 118) [Activation],(Unnamed Layer* 119) [Convolution],(Unnamed Layer* 120) [Scale],(Unnamed Layer* 121) [Convolution],(Unnamed Layer* 122) [Scale],(Unnamed Layer* 123) [ElementWise],(Unnamed Layer* 124) [Activation],(Unnamed Layer* 125) [Convolution],(Unnamed Layer* 126) [Scale],(Unnamed Layer* 127) [Activation],(Unnamed Layer* 128) [Convolution],(Unnamed Layer* 129) [Scale],(Unnamed Layer* 130) [ElementWise],(Unnamed Layer* 131) [Activation],(Unnamed Layer* 132) [Convolution],(Unnamed Layer* 133) [Scale],(Unnamed Layer* 134) [Activation],(Unnamed Layer* 135) [Convolution],(Unnamed Layer* 136) [Scale],(Unnamed Layer* 137) [ElementWise],(Unnamed Layer* 138) [Activation],(Unnamed Layer* 139) [Convolution],(Unnamed Layer* 140) [Scale],(Unnamed Layer* 141) [Activation],(Unnamed Layer* 142) [Convolution],(Unnamed Layer* 143) [Scale],(Unnamed Layer* 144) [Activation],(Unnamed Layer* 145) [Deconvolution],(Unnamed Layer* 146) [Scale],(Unnamed Layer* 147) [Activation],(Unnamed Layer* 148) [Convolution],(Unnamed Layer* 149) [Scale],(Unnamed Layer* 150) [Activation],(Unnamed Layer* 151) [Convolution],(Unnamed Layer* 152) [Scale],(Unnamed Layer* 153) [Activation]}, {(Unnamed Layer* 155) [Scale],(Unnamed Layer* 156) [Activation],(Unnamed Layer* 157) [Convolution],(Unnamed Layer* 158) [Scale],(Unnamed Layer* 159) [Activation]}, {(Unnamed Layer* 161) [Scale],(Unnamed Layer* 162) [Activation],(Unnamed Layer* 163) [Convolution],(Unnamed Layer* 164) [Activation]}, {(Unnamed Layer* 166) [Scale],(Unnamed Layer* 167) [Activation],(Unnamed Layer* 168) [Convolution],(Unnamed Layer* 169) [Scale],(Unnamed Layer* 170) [Activation]}, {(Unnamed Layer* 172) [Scale],(Unnamed Layer* 173) [Activation],(Unnamed Layer* 174) [Convolution],(Unnamed Layer* 175) [Scale],(Unnamed Layer* 176) [Activation]}, {(Unnamed Layer* 178) [Scale],(Unnamed Layer* 179) [Activation],(Unnamed Layer* 180) [Convolution],(Unnamed Layer* 181) [Scale],(Unnamed Layer* 182) [Activation],(Unnamed Layer* 183) [Scale],(Unnamed Layer* 184) [Activation],(Unnamed Layer* 185) [Concatenation]}, {(Unnamed Layer* 187) [Scale],(Unnamed Layer* 188) [Activation],(Unnamed Layer* 189) [Convolution],(Unnamed Layer* 190) [Activation]},
[07/31/2020-12:31:14] [I] [TRT] --------------- Layers running on GPU:
[07/31/2020-12:31:14] [I] [TRT] (Unnamed Layer* 154) [Convolution], (Unnamed Layer* 160) [Convolution], (Unnamed Layer* 165) [Convolution], (Unnamed Layer* 171) [Convolution], (Unnamed Layer* 177) [Convolution], (Unnamed Layer* 186) [Convolution],
[07/31/2020-12:31:18] [W] [TRT] DLA Node compilation Failed.
[07/31/2020-12:31:18] [W] [TRT] DLA Node compilation Failed.
[07/31/2020-12:31:18] [E] [TRT] Try increasing the workspace size with IBuilderConfig::setMaxWorkspaceSize() if using IBuilder::buildEngineWithConfig, or IBuilder::setMaxWorkspaceSize() if using IBuilder::buildCudaEngine.
[07/31/2020-12:31:18] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 (Could not find any implementation for node {(Unnamed Layer* 0) [Convolution],(Unnamed Layer* 1) [Scale],(Unnamed Layer* 2) [Activation],(Unnamed Layer* 3) [Pooling],(Unnamed Layer* 4) [Convolution],(Unnamed Layer* 5) [Scale],(Unnamed Layer* 6) [Activation],(Unnamed Layer* 7) [Pooling],(Unnamed Layer* 8) [Convolution],(Unnamed Layer* 9) [Scale],(Unnamed Layer* 10) [Activation],(Unnamed Layer* 11) [Pooling],(Unnamed Layer* 12) [ElementWise],(Unnamed Layer* 13) [Concatenation],(Unnamed Layer* 14) [Convolution],(Unnamed Layer* 15) [Scale],(Unnamed Layer* 16) [Activation],(Unnamed Layer* 17) [Convolution],(Unnamed Layer* 18) [Scale],(Unnamed Layer* 19) [Activation],(Unnamed Layer* 20) [Convolution],(Unnamed Layer* 21) [Scale],(Unnamed Layer* 22) [Activation],(Unnamed Layer* 23) [Convolution],(Unnamed Layer* 24) [Scale],(Unnamed Layer* 25) [ElementWise],(Unnamed Layer* 26) [Activation],(Unnamed Layer* 27) [Convolution],(Unnamed Layer* 28) [Scale],(Unnamed Layer* 29) [Activation],(Unnamed Layer* 30) [Convolution],(Unnamed Layer* 31) [Scale],(Unnamed Layer* 32) [ElementWise],(Unnamed Layer* 33) [Activation],(Unnamed Layer* 34) [Convolution],(Unnamed Layer* 35) [Scale],(Unnamed Layer* 36) [Activation],(Unnamed Layer* 37) [Convolution],(Unnamed Layer* 38) [Scale],(Unnamed Layer* 39) [ElementWise],(Unnamed Layer* 40) [Activation],(Unnamed Layer* 41) [ElementWise],(Unnamed Layer* 42) [Convolution],(Unnamed Layer* 43) [Scale],(Unnamed Layer* 44) [Activation],(Unnamed Layer* 45) [Convolution],(Unnamed Layer* 46) [Scale],(Unnamed Layer* 47) [Convolution],(Unnamed Layer* 48) [Scale],(Unnamed Layer* 49) [ElementWise],(Unnamed Layer* 50) [Activation],(Unnamed Layer* 51) [Convolution],(Unnamed Layer* 52) [Scale],(Unnamed Layer* 53) [Activation],(Unnamed Layer* 54) [Convolution],(Unnamed Layer* 55) [Scale],(Unnamed Layer* 56) [ElementWise],(Unnamed Layer* 57) [Activation],(Unnamed Layer* 58) [Convolution],(Unnamed Layer* 59) [Scale],(Unnamed Layer* 60) [Activation],(Unnamed Layer* 61) [Convolution],(Unnamed Layer* 62) [Scale],(Unnamed Layer* 63) [ElementWise],(Unnamed Layer* 64) [Activation],(Unnamed Layer* 65) [Convolution],(Unnamed Layer* 66) [Scale],(Unnamed Layer* 67) [Activation],(Unnamed Layer* 68) [Convolution],(Unnamed Layer* 69) [Scale],(Unnamed Layer* 70) [ElementWise],(Unnamed Layer* 71) [Activation],(Unnamed Layer* 72) [Convolution],(Unnamed Layer* 73) [Scale],(Unnamed Layer* 74) [Activation],(Unnamed Layer* 75) [Convolution],(Unnamed Layer* 76) [Scale],(Unnamed Layer* 77) [Convolution],(Unnamed Layer* 78) [Scale],(Unnamed Layer* 79) [ElementWise],(Unnamed Layer* 80) [Activation],(Unnamed Layer* 81) [Convolution],(Unnamed Layer* 82) [Scale],(Unnamed Layer* 83) [Activation],(Unnamed Layer* 84) [Convolution],(Unnamed Layer* 85) [Scale],(Unnamed Layer* 86) [ElementWise],(Unnamed Layer* 87) [Activation],(Unnamed Layer* 88) [Convolution],(Unnamed Layer* 89) [Scale],(Unnamed Layer* 90) [Activation],(Unnamed Layer* 91) [Convolution],(Unnamed Layer* 92) [Scale],(Unnamed Layer* 93) [ElementWise],(Unnamed Layer* 94) [Activation],(Unnamed Layer* 95) [Convolution],(Unnamed Layer* 96) [Scale],(Unnamed Layer* 97) [Activation],(Unnamed Layer* 98) [Convolution],(Unnamed Layer* 99) [Scale],(Unnamed Layer* 100) [ElementWise],(Unnamed Layer* 101) [Activation],(Unnamed Layer* 102) [Convolution],(Unnamed Layer* 103) [Scale],(Unnamed Layer* 104) [Activation],(Unnamed Layer* 105) [Convolution],(Unnamed Layer* 106) [Scale],(Unnamed Layer* 107) [ElementWise],(Unnamed Layer* 108) [Activation],(Unnamed Layer* 109) [Convolution],(Unnamed Layer* 110) [Scale],(Unnamed Layer* 111) [Activation],(Unnamed Layer* 112) [Convolution],(Unnamed Layer* 113) [Scale],(Unnamed Layer* 114) [ElementWise],(Unnamed Layer* 115) [Activation],(Unnamed Layer* 116) [Convolution],(Unnamed Layer* 117) [Scale],(Unnamed Layer* 118) [Activation],(Unnamed Layer* 119) [Convolution],(Unnamed Layer* 120) [Scale],(Unnamed Layer* 121) [Convolution],(Unnamed Layer* 122) [Scale],(Unnamed Layer* 123) [ElementWise],(Unnamed Layer* 124) [Activation],(Unnamed Layer* 125) [Convolution],(Unnamed Layer* 126) [Scale],(Unnamed Layer* 127) [Activation],(Unnamed Layer* 128) [Convolution],(Unnamed Layer* 129) [Scale],(Unnamed Layer* 130) [ElementWise],(Unnamed Layer* 131) [Activation],(Unnamed Layer* 132) [Convolution],(Unnamed Layer* 133) [Scale],(Unnamed Layer* 134) [Activation],(Unnamed Layer* 135) [Convolution],(Unnamed Layer* 136) [Scale],(Unnamed Layer* 137) [ElementWise],(Unnamed Layer* 138) [Activation],(Unnamed Layer* 139) [Convolution],(Unnamed Layer* 140) [Scale],(Unnamed Layer* 141) [Activation],(Unnamed Layer* 142) [Convolution],(Unnamed Layer* 143) [Scale],(Unnamed Layer* 144) [Activation],(Unnamed Layer* 145) [Deconvolution],(Unnamed Layer* 146) [Scale],(Unnamed Layer* 147) [Activation],(Unnamed Layer* 148) [Convolution],(Unnamed Layer* 149) [Scale],(Unnamed Layer* 150) [Activation],(Unnamed Layer* 151) [Convolution],(Unnamed Layer* 152) [Scale],(Unnamed Layer* 153) [Activation]}.)
[07/31/2020-12:31:19] [E] [TRT] ../builder/tacticOptimizer.cpp (1715) - TRTInternal Error in computeCosts: 0 ()
[07/31/2020-12:31:19] [E] Engine creation failed
[07/31/2020-12:31:19] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=Resnet34_3inputs_448x448_20200609.onnx --explicitBatch --batch=1 --workspace=4096 --saveEngine=test.engine --fp16 --useDLACore=0 --allowGPUFallback

Hi,

It looks like the workspace is not enough for TensorRT to deploy the model.

Have you run the model on a desktop environment before?
If yes, please help to check the required memory for the inference first.

Thanks.

I don’t have a desktop GPU with DLA while I’ve worked with GTX 1080Ti.
Jetson Xavier NX was able to run the model without DLA. In that case, the used RAM increased by about 1.1GB (1864MB to 2997MB) before and after the inference as monitored by tegrastats.

I also tried to change the workspace for trtexec with DLA.

  • 5500MB: Same error that I already reported.
  • 6000MB: The process was killed for an unknown reason.
  • 7500MB: CUDA Error in allocate: 2 (out of memory)

Does DLA require so much memory?

Hi,

Do you indicate that this model can be run with GPU on the XavierNX directly?

In general, DLA will automatically fallback the operation into GPU once the capacity is full.
So it won’t be an issue if the model can be inferenced with GPU.

Thanks.

Hi,

I want to run all or part of the model on DLA in order to run 3D graphics on GPU. It’s OK that some operations are automatically fallen back into GPU.
But now, loading the model fails when DLA is enabled with TensorRT APIs.

Hi,

Thanks for your feedback.

We can reproduce this in our environment, and we are checking this with our internal team for suggestion.
Will update more information with you once we got any feedback.

Hi,

We can get a similar model size with TensorRT 7.1.3, which is integrated into JetPack4.4 product release(GA).
Would you mind to give it a try.

You may need to create some swap memory when converting the onnx model into TensorRT engine.

Thanks.