AssertionError: Max workspace size for TensorRT inference should be positive, got 0

nikolai0792 · July 21, 2021, 9:05am

Hi. I have a problem with tlt inference efficientnet_b0. Without trt_engine it works fine, but with trt_engine the error is:

2021-07-20 14:48:13,111 [INFO] main: Running inference with TensorRT as backend.
Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/inference.py”, line 236, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/inference.py”, line 90, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/spec_loader/spec_wrapper.py”, line 589, in infer_workspace_size
AssertionError: Max workspace size for TensorRT inference should be positive, got 0.
Traceback (most recent call last):
File “/usr/local/bin/faster_rcnn”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/entrypoint/faster_rcnn.py”, line 12, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py”, line 296, in launch_job
AssertionError: Process run failed.

Run command: !faster_rcnn inference --gpu_index 0 -e $SPECS_DIR/default_spec_efficientnet_b0.txt

Thanks in advance.
default_spec_efficientnet_b0.txt (4.9 KB)
faster_rcnn.ipynb (524.9 KB)
trt.fp16.engine (9.4 MB)

NVES · July 21, 2021, 9:37am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

nikolai0792 · July 21, 2021, 10:17am

I am using docker nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3. FasterRCNN / Efficientnet B0 architecture for Object Detection. I do not have trtexec.

nikolai0792 · July 21, 2021, 12:09pm

F:\TensorRT-7.2.1.6\bin>trtexec —loadEngine=trt_fp162.engine —batch=1 —useDLACore=0 —fp16 —verbose
&&&& RUNNING TensorRT.trtexec # trtexec —loadEngine=trt_fp162.engine —batch=1 —useDLACore=0 —fp16 —verbose
[07/21/2021-17:07:49] [I] === Model Options ===
[07/21/2021-17:07:49] [I] Format: *
[07/21/2021-17:07:49] [I] Model:
[07/21/2021-17:07:49] [I] Output:
[07/21/2021-17:07:49] [I] === Build Options ===
[07/21/2021-17:07:49] [I] Max batch: 1
[07/21/2021-17:07:49] [I] Workspace: 16 MiB
[07/21/2021-17:07:49] [I] minTiming: 1
[07/21/2021-17:07:49] [I] avgTiming: 8
[07/21/2021-17:07:49] [I] Precision: FP32+FP16
[07/21/2021-17:07:49] [I] Calibration:
[07/21/2021-17:07:49] [I] Refit: Disabled
[07/21/2021-17:07:49] [I] Safe mode: Disabled
[07/21/2021-17:07:49] [I] Save engine:
[07/21/2021-17:07:49] [I] Load engine: trt_fp162.engine
[07/21/2021-17:07:49] [I] Builder Cache: Enabled
[07/21/2021-17:07:49] [I] NVTX verbosity: 0
[07/21/2021-17:07:49] [I] Tactic sources: Using default tactic sources
[07/21/2021-17:07:49] [I] Input(s)s format: fp32:CHW
[07/21/2021-17:07:49] [I] Output(s)s format: fp32:CHW
[07/21/2021-17:07:49] [I] Input build shapes: model
[07/21/2021-17:07:49] [I] Input calibration shapes: model
[07/21/2021-17:07:49] [I] === System Options ===
[07/21/2021-17:07:49] [I] Device: 0
[07/21/2021-17:07:49] [I] DLACore: 0
[07/21/2021-17:07:49] [I] Plugins:
[07/21/2021-17:07:49] [I] === Inference Options ===
[07/21/2021-17:07:49] [I] Batch: 1
[07/21/2021-17:07:49] [I] Input inference shapes: model
[07/21/2021-17:07:49] [I] Iterations: 10
[07/21/2021-17:07:49] [I] Duration: 3s (+ 200ms warm up)
[07/21/2021-17:07:49] [I] Sleep time: 0ms
[07/21/2021-17:07:49] [I] Streams: 1
[07/21/2021-17:07:49] [I] ExposeDMA: Disabled
[07/21/2021-17:07:49] [I] Data transfers: Enabled
[07/21/2021-17:07:49] [I] Spin-wait: Disabled
[07/21/2021-17:07:49] [I] Multithreading: Disabled
[07/21/2021-17:07:49] [I] CUDA Graph: Disabled
[07/21/2021-17:07:49] [I] Separate profiling: Disabled
[07/21/2021-17:07:49] [I] Skip inference: Disabled
[07/21/2021-17:07:49] [I] Inputs:
[07/21/2021-17:07:49] [I] === Reporting Options ===
[07/21/2021-17:07:49] [I] Verbose: Enabled
[07/21/2021-17:07:49] [I] Averages: 10 inferences
[07/21/2021-17:07:49] [I] Percentile: 99
[07/21/2021-17:07:49] [I] Dump refittable layers:Disabled
[07/21/2021-17:07:49] [I] Dump output: Disabled
[07/21/2021-17:07:49] [I] Profile: Disabled
[07/21/2021-17:07:49] [I] Export timing to JSON file:
[07/21/2021-17:07:49] [I] Export output to JSON file:
[07/21/2021-17:07:49] [I] Export profile to JSON file:
[07/21/2021-17:07:49] [I]
[07/21/2021-17:07:49] [I] === Device Information ===
[07/21/2021-17:07:49] [I] Selected Device: GeForce RTX 2080 SUPER
[07/21/2021-17:07:49] [I] Compute Capability: 7.5
[07/21/2021-17:07:49] [I] SMs: 48
[07/21/2021-17:07:49] [I] Compute Clock Rate: 1.86 GHz
[07/21/2021-17:07:49] [I] Device Global Memory: 8192 MiB
[07/21/2021-17:07:49] [I] Shared Memory per SM: 64 KiB
[07/21/2021-17:07:49] [I] Memory Bus Width: 256 bits (ECC disabled)
[07/21/2021-17:07:49] [I] Memory Clock Rate: 7.751 GHz
[07/21/2021-17:07:49] [I]
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator -
::BatchedNMSDynamic_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Proposal version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::Split version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[07/21/2021-17:07:49] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[07/21/2021-17:07:49] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[07/21/2021-17:07:50] [V] [TRT] Deserialize required 972655 microseconds.
[07/21/2021-17:07:50] [I] Engine loaded in 1.45353 sec.
[07/21/2021-17:07:50] [V] [TRT] Allocated persistent device memory of size 7939072
[07/21/2021-17:07:50] [V] [TRT] Allocated activation device memory of size 82851840
[07/21/2021-17:07:50] [V] [TRT] Assigning persistent memory blocks for various profiles
[07/21/2021-17:07:50] [I] Starting inference
[07/21/2021-17:07:54] [I] Warmup completed 18 queries over 200 ms
[07/21/2021-17:07:54] [I] Timing trace has 288 queries over 3.03298 s
[07/21/2021-17:07:54] [I] Trace averages of 10 runs:
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.8644 ms - Host latency: 11.1751 ms (end to end 21.6694 ms, enqueue 2.72661 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.1032 ms - Host latency: 10.3889 ms (end to end 20.2971 ms, enqueue 1.86987 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.338 ms - Host latency: 10.6703 ms (end to end 20.5076 ms, enqueue 4.05135 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.0758 ms - Host latency: 10.3887 ms (end to end 20.0577 ms, enqueue 3.01985 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.2733 ms - Host latency: 10.6237 ms (end to end 20.3859 ms, enqueue 4.01158 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.3905 ms - Host latency: 10.7418 ms (end to end 20.5607 ms, enqueue 4.63024 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.6382 ms - Host latency: 10.9901 ms (end to end 21.1092 ms, enqueue 3.66507 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.3935 ms - Host latency: 10.6974 ms (end to end 20.6492 ms, enqueue 2.73516 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.6827 ms - Host latency: 11.0438 ms (end to end 21.1739 ms, enqueue 4.13207 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.468 ms - Host latency: 10.7859 ms (end to end 20.8885 ms, enqueue 3.00642 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.0757 ms - Host latency: 10.3676 ms (end to end 19.9914 ms, enqueue 3.10895 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.5241 ms - Host latency: 10.8995 ms (end to end 20.6664 ms, enqueue 4.9746 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.6285 ms - Host latency: 10.9483 ms (end to end 21.2055 ms, enqueue 3.37981 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 9.94176 ms - Host latency: 10.2161 ms (end to end 19.8079 ms, enqueue 2.03817 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.0193 ms - Host latency: 10.3165 ms (end to end 19.9815 ms, enqueue 2.48997 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.2272 ms - Host latency: 10.586 ms (end to end 20.196 ms, enqueue 4.5922 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.6085 ms - Host
latency: 10.9472 ms (end to end 20.9898 ms, enqueue 4.4601 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.5607 ms - Host latency: 10.9305 ms (end to end 21.0275 ms, enqueue 4.39774 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.2144 ms - Host latency: 10.5527 ms (end to end 20.2159 ms, enqueue 3.96094 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.3187 ms - Host latency: 10.664 ms (end to end 20.4078 ms, enqueue 4.64431 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.1022 ms - Host latency: 10.403 ms (end to end 20.077 ms, enqueue 2.70325 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.0407 ms - Host latency: 10.3199 ms (end to end 19.9902 ms, enqueue 2.28005 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.3156 ms - Host latency: 10.5926 ms (end to end 20.492 ms, enqueue 2.46331 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.4193 ms - Host latency: 10.7041 ms (end to end 20.7837 ms, enqueue 2.56497 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.1235 ms - Host latency: 10.4142 ms (end to end 20.0729 ms, enqueue 2.72463 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.0889 ms - Host latency: 10.46 ms (end to end 19.982 ms, enqueue 2.80728 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.9686 ms - Host latency: 11.259 ms (end to end 21.8492 ms, enqueue 1.47139 ms)
[07/21/2021-17:07:54] [I] Average on 10 runs - GPU latency: 10.1757 ms - Host latency: 10.4552 ms (end to end 20.3678 ms, enqueue 1.06743 ms)
[07/21/2021-17:07:54] [I] Host Latency
[07/21/2021-17:07:54] [I] min: 9.88416 ms (end to end 19.1735 ms)
[07/21/2021-17:07:54] [I] max: 13.6069 ms (end to end 24.0647 ms)
[07/21/2021-17:07:54] [I] mean: 10.6616 ms (end to end 20.5486 ms)
[07/21/2021-17:07:54] [I] median: 10.6414 ms (end to end 20.5294 ms)
[07/21/2021-17:07:54] [I] percentile: 11.7842 ms at 99% (end to end 22.5961 ms at 99%)
[07/21/2021-17:07:54] [I] throughput: 94.956 qps
[07/21/2021-17:07:54] [I] walltime: 3.03298 s
[07/21/2021-17:07:54] [I] Enqueue Time
[07/21/2021-17:07:54] [I] min: 0.923584 ms
[07/21/2021-17:07:54] [I] max: 6.36145 ms
[07/21/2021-17:07:54] [I] median: 2.80911 ms
[07/21/2021-17:07:54] [I] GPU Compute
[07/21/2021-17:07:54] [I] min: 9.60632 ms
[07/21/2021-17:07:54] [I] max: 13.3098 ms
[07/21/2021-17:07:54] [I] mean: 10.3427 ms
[07/21/2021-17:07:54] [I] median: 10.3157 ms
[07/21/2021-17:07:54] [I] percentile: 11.5135 ms at 99%
[07/21/2021-17:07:54] [I] total compute time: 2.9787 s
&&&& PASSED TensorRT.trtexec # trtexec —loadEngine=trt_fp162.engine —batch=1 —useDLACore=0 —fp16 —verbose

spolisetty · July 21, 2021, 1:58pm

Hi @nikolai0792,

Looks like this is related to TLT. We recommend you to post your concern on TLT related forum to get better help.

Thank you.

Topic		Replies	Views
TensorRT inference process TensorRT	4	636	May 17, 2021
YOLO v4 inference with TensorRT after training with TLT 3.0 TensorRT tensorrt , yolo , python	8	2519	October 12, 2021
Error loading .trt model Jetson AGX Orin tensorrt	7	135	November 6, 2024
TensorRT --- non-int8 fallback when trying to calibrate ONNX model DeepStream SDK tensorrt , deepstream	11	429	July 1, 2024
DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet \| NVIDIA NGC) TAO Toolkit tensorrt , driveworks , onnx	6	32	February 12, 2025
Trt with batch TensorRT	4	629	July 27, 2022
Tensorrt inference with batch > 1 TensorRT	4	1387	October 13, 2022
Tensorrt Inference Segmentation fault TensorRT tensorrt , cudnn	6	325	June 5, 2024
tensorRT inference unstable compared onnxruntime TensorRT	4	1316	May 4, 2021
ConvTranspose + Add Slow TensorRT tensorrt	4	653	July 25, 2023

AssertionError: Max workspace size for TensorRT inference should be positive, got 0

Related topics