Device memory is insufficient to use tactic

Hi. I’m experiencing problems when converting a ONNX model to engine. I can see that the jetson has 24+GiB free memory but it still comlpains about not having sufficient memory. I tried increasing the workspace size but the error persists. An example of the increased workspace with verbose can be seen in the file below and a non verbose example is pasted in the formatted text. I also tried decreasing it to 4GB, but that didn’t work either.

16GiB-verbose.txt (396.8 KB)

/./usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=/home/nvidia/repos/128x128_2021-09-21.onnx --saveEngine=model.engine --workspace=6000 --buildOnly --shapes=inputs:160x128x128x3
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /./usr/src/tensorrt/bin/trtexec --explicitBatch --onnx=/home/nvidia/repos/128x128_2021-09-21.onnx --saveEngine=model.engine --workspace=6000 --buildOnly --shapes=inputs:160x128x128x3
[11/01/2021-13:22:37] [I] === Model Options ===
[11/01/2021-13:22:37] [I] Format: ONNX
[11/01/2021-13:22:37] [I] Model: /home/nvidia/repos/128x128_2021-09-21.onnx
[11/01/2021-13:22:37] [I] Output:
[11/01/2021-13:22:37] [I] === Build Options ===
[11/01/2021-13:22:37] [I] Max batch: explicit
[11/01/2021-13:22:37] [I] Workspace: 6000 MiB
[11/01/2021-13:22:37] [I] minTiming: 1
[11/01/2021-13:22:37] [I] avgTiming: 8
[11/01/2021-13:22:37] [I] Precision: FP32
[11/01/2021-13:22:37] [I] Calibration: 
[11/01/2021-13:22:37] [I] Refit: Disabled
[11/01/2021-13:22:37] [I] Sparsity: Disabled
[11/01/2021-13:22:37] [I] Safe mode: Disabled
[11/01/2021-13:22:37] [I] Restricted mode: Disabled
[11/01/2021-13:22:37] [I] Save engine: model.engine
[11/01/2021-13:22:37] [I] Load engine: 
[11/01/2021-13:22:37] [I] NVTX verbosity: 0
[11/01/2021-13:22:37] [I] Tactic sources: Using default tactic sources
[11/01/2021-13:22:37] [I] timingCacheMode: local
[11/01/2021-13:22:37] [I] timingCacheFile: 
[11/01/2021-13:22:37] [I] Input(s)s format: fp32:CHW
[11/01/2021-13:22:37] [I] Output(s)s format: fp32:CHW
[11/01/2021-13:22:37] [I] Input build shape: inputs=160x128x128x3+160x128x128x3+160x128x128x3
[11/01/2021-13:22:37] [I] Input calibration shapes: model
[11/01/2021-13:22:37] [I] === System Options ===
[11/01/2021-13:22:37] [I] Device: 0
[11/01/2021-13:22:37] [I] DLACore: 
[11/01/2021-13:22:37] [I] Plugins:
[11/01/2021-13:22:37] [I] === Inference Options ===
[11/01/2021-13:22:37] [I] Batch: Explicit
[11/01/2021-13:22:37] [I] Input inference shape: inputs=160x128x128x3
[11/01/2021-13:22:37] [I] Iterations: 10
[11/01/2021-13:22:37] [I] Duration: 3s (+ 200ms warm up)
[11/01/2021-13:22:37] [I] Sleep time: 0ms
[11/01/2021-13:22:37] [I] Streams: 1
[11/01/2021-13:22:37] [I] ExposeDMA: Disabled
[11/01/2021-13:22:37] [I] Data transfers: Enabled
[11/01/2021-13:22:37] [I] Spin-wait: Disabled
[11/01/2021-13:22:37] [I] Multithreading: Disabled
[11/01/2021-13:22:37] [I] CUDA Graph: Disabled
[11/01/2021-13:22:37] [I] Separate profiling: Disabled
[11/01/2021-13:22:37] [I] Time Deserialize: Disabled
[11/01/2021-13:22:37] [I] Time Refit: Disabled
[11/01/2021-13:22:37] [I] Skip inference: Enabled
[11/01/2021-13:22:37] [I] Inputs:
[11/01/2021-13:22:37] [I] === Reporting Options ===
[11/01/2021-13:22:37] [I] Verbose: Disabled
[11/01/2021-13:22:37] [I] Averages: 10 inferences
[11/01/2021-13:22:37] [I] Percentile: 99
[11/01/2021-13:22:37] [I] Dump refittable layers:Disabled
[11/01/2021-13:22:37] [I] Dump output: Disabled
[11/01/2021-13:22:37] [I] Profile: Disabled
[11/01/2021-13:22:37] [I] Export timing to JSON file: 
[11/01/2021-13:22:37] [I] Export output to JSON file: 
[11/01/2021-13:22:37] [I] Export profile to JSON file: 
[11/01/2021-13:22:37] [I] 
[11/01/2021-13:22:37] [I] === Device Information ===
[11/01/2021-13:22:37] [I] Selected Device: Xavier
[11/01/2021-13:22:37] [I] Compute Capability: 7.2
[11/01/2021-13:22:37] [I] SMs: 8
[11/01/2021-13:22:37] [I] Compute Clock Rate: 1.377 GHz
[11/01/2021-13:22:37] [I] Device Global Memory: 31928 MiB
[11/01/2021-13:22:37] [I] Shared Memory per SM: 96 KiB
[11/01/2021-13:22:37] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/01/2021-13:22:37] [I] Memory Clock Rate: 1.377 GHz
[11/01/2021-13:22:37] [I] 
[11/01/2021-13:22:37] [I] TensorRT version: 8001
[11/01/2021-13:22:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 2730 (MiB)
[11/01/2021-13:22:39] [I] Start parsing network model
[11/01/2021-13:22:39] [I] [TRT] ----------------------------------------------------------------
[11/01/2021-13:22:39] [I] [TRT] Input filename:   /home/nvidia/repos/128x128_2021-09-21.onnx
[11/01/2021-13:22:39] [I] [TRT] ONNX IR version:  0.0.4
[11/01/2021-13:22:39] [I] [TRT] Opset version:    9
[11/01/2021-13:22:39] [I] [TRT] Producer name:    keras2onnx
[11/01/2021-13:22:39] [I] [TRT] Producer version: 1.8.1
[11/01/2021-13:22:39] [I] [TRT] Domain:           onnxmltools
[11/01/2021-13:22:39] [I] [TRT] Model version:    0
[11/01/2021-13:22:39] [I] [TRT] Doc string:       
[11/01/2021-13:22:39] [I] [TRT] ----------------------------------------------------------------
[11/01/2021-13:22:39] [I] Finish parsing network model
[11/01/2021-13:22:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 391, GPU 2810 (MiB)
[11/01/2021-13:22:39] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 391 MiB, GPU 2810 MiB
[11/01/2021-13:22:39] [I] [TRT] ---------- Layers Running on DLA ----------
[11/01/2021-13:22:39] [I] [TRT] ---------- Layers Running on GPU ----------
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity4
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity31
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose36
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose37
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity30
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose34
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose35
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity29
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose32
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_1
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose33
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity28
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose30
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_1
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose31
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity27
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_1)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity26
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose28
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_2
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose29
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity25
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose26
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_2
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose27
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity24
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_2)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose24
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_3
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose25
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity23
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose22
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_3
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose23
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity22
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_3)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose20
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_4
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose21
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity21
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose18
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_4
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose19
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity20
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_4)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity19
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose16
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_transpose
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose17
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity18
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose14
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_5
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose15
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity17
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_5)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity16
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose12
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_transpose_1
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose13
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity15
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose10
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_6
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose11
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity14
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_6)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity13
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose8
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_transpose_2
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose9
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity12
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose6
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_7
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose7
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity11
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_7)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity10
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose4
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_transpose_3
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose5
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity9
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose2
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] batch_normalization_8
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose3
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity8
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(elu_8)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] conv2d_transpose_4
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Transpose1
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity7
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] PWN(Sigmoid)
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity6
[11/01/2021-13:22:39] [I] [TRT] [GpuLayer] Identity5
[11/01/2021-13:22:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +334, now: CPU 618, GPU 3150 (MiB)
[11/01/2021-13:22:42] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +513, now: CPU 925, GPU 3663 (MiB)
[11/01/2021-13:22:42] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[11/01/2021-13:23:53] [E] [TRT] Tactic Device request: 2790MB Available: 1536MB. Device memory is insufficient to use tactic.
[11/01/2021-13:23:53] [W] [TRT] Skipping tactic 2 due to oom error on requested size of 2790 detected for tactic 2.
[11/01/2021-13:24:02] [E] [TRT] Tactic Device request: 2790MB Available: 1536MB. Device memory is insufficient to use tactic.
[11/01/2021-13:24:02] [W] [TRT] Skipping tactic 6 due to oom error on requested size of 2790 detected for tactic 58.

Hi,

Does the 160 indicate the batch size?
If yes, is it possible to try it with a small value?

Thanks.

Hi!
It does, It didn’t work for 80 but 40 seems to work. However our model is trained for a batch size of 160. Furthermore, a batch size of 160 worked perfectly with the older version of tensorRT (7.1)/Jetpack.

Hi,

Suppose you are saying TensorRT 7.1.3. (version 7.2 is not available for Jetson)
When running it with batch size 40, could you monitor the memory usage and share it with us?

$ sudo tegrastats

Thanks.

Hi. You’re right, it was 7.1.

I did one for a few different conversions. Only the 40 batch size succeeded.

batch040_worksize4048.log (37.7 KB)
batch160_worksize4048.log (105.0 KB)
batch160_worksize8096.log (92.8 KB)

Hi,

Since you are using 32GiB Xavier, would you mind trying batch160+worksize16GiB?
If it is still not working, would you mind sharing your model for our testing?

Thanks.

Hi.
I tried with 16GiB, but it was the same errors. However i’m not allowed by my employer to share the model on the forum… can i share it to you directly?

Hi,

Could you share it via the private message?
Thanks.

Hi,

Thanks for your model.

Confirm that we can reproduce this issue internally as well.
After checking with the internal team, this issue is fixed in our next TensorRT release.

However, we don’t have a quick solution for JetPack 4.6 currently.
Please wait for our next package for the improvement.

Thanks and sorry for the inconvenience.

@AstaLLL, can you confirm the TensorRT release version where the fix will be included?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.