Trtexec failed to create an engine from onnx file with fp16

Description

When I try to generate an engine file from onnx the process stops with the message killed. This happens when CPU memory overflow occurs. The error only occurs if I use --fp16 and on Ampere.

The last lines of the extended log output

[07/01/2022-05:42:58] [V] [TRT] --------------- Timing Runner: PWN(fc.2.weight + (Unnamed Layer* 404) [Shuffle], PRelu_397) (PointWise)
[07/01/2022-05:42:58] [V] [TRT] Tactic: 128 Time: 0.004172
[07/01/2022-05:42:58] [V] [TRT] Tactic: 256 Time: 0.004424
[07/01/2022-05:42:58] [V] [TRT] Tactic: 512 Time: 0.004096
[07/01/2022-05:42:58] [V] [TRT] Tactic: -32 Time: 0.033792
[07/01/2022-05:42:58] [V] [TRT] Tactic: -64 Time: 0.01856
[07/01/2022-05:42:58] [V] [TRT] Tactic: -128 Time: 0.011264
[07/01/2022-05:42:58] [V] [TRT] Fastest Tactic: 512 Time: 0.004096
[07/01/2022-05:42:58] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 1
[07/01/2022-05:42:58] [V] [TRT] *************** Autotuning format combination: Half(256,1) → Half(256,1) ***************
[07/01/2022-05:42:58] [V] [TRT] --------------- Timing Runner: PWN(fc.2.weight + (Unnamed Layer* 404) [Shuffle], PRelu_397) (PointWiseV2)
[07/01/2022-05:42:58] [V] [TRT] Tactic: 0 Time: 0.00316
[07/01/2022-05:42:58] [V] [TRT] Tactic: 1 Time: 0.003384
[07/01/2022-05:42:59] [V] [TRT] Tactic: 2 Time: 0.003116
[07/01/2022-05:42:59] [V] [TRT] Tactic: 3 Time: 0.00406
[07/01/2022-05:42:59] [V] [TRT] Tactic: 4 Time: 0.003312
[07/01/2022-05:42:59] [V] [TRT] Tactic: 5 Time: 0.003072
[07/01/2022-05:43:00] [V] [TRT] Tactic: 6 Time: 0.004404
[07/01/2022-05:43:00] [V] [TRT] Tactic: 7 Time: 0.004036
[07/01/2022-05:43:00] [V] [TRT] Tactic: 8 Time: 0.003328
[07/01/2022-05:43:00] [V] [TRT] Tactic: 9 Time: 0.003072
[07/01/2022-05:43:01] [V] [TRT] Tactic: 28 Time: 0.003168
[07/01/2022-05:43:01] [V] [TRT] Fastest Tactic: 5 Time: 0.003072
[07/01/2022-05:43:01] [V] [TRT] --------------- Timing Runner: PWN(fc.2.weight + (Unnamed Layer* 404) [Shuffle], PRelu_397) (PointWise)
[07/01/2022-05:43:01] [V] [TRT] Tactic: 128 Time: 0.00684
[07/01/2022-05:43:01] [V] [TRT] Tactic: 256 Time: 0.008576
[07/01/2022-05:43:01] [V] [TRT] Tactic: 512 Time: 0.007624
[07/01/2022-05:43:01] [V] [TRT] Tactic: -32 Time: 0.032864
[07/01/2022-05:43:01] [V] [TRT] Tactic: -64 Time: 0.01842
[07/01/2022-05:43:01] [V] [TRT] Tactic: -128 Time: 0.011392
[07/01/2022-05:43:01] [V] [TRT] Fastest Tactic: 128 Time: 0.00684
[07/01/2022-05:43:01] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: PointWiseV2 Tactic: 5
Killed

Environment

TensorRT Version: 8.2.5.1
GPU Type: NVIDIA GeForce RTX 3070
Nvidia Driver Version: 510.73.05
CUDA Version: 11.6
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:22.05-py3

Relevant Files

model.onnx

Steps To Reproduce

  1. docker run -it --rm --gpus=all -v pwd/models:/models nvcr.io/nvidia/tensorrt:22.05-py3 /bin/bash
  2. trtexec --onnx=/models/models.onnx --fp16 --verbose

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

  1. ONNX model checked, everything is fine.
  2. I ran everything through trtexec. The command was specified in the first message. Onnx model attached (link in the first message). I am also adding a full log file. Full log file (link).

Hi,

Could you please let us know info on how this ONNX model is generated?
Ans make sure the model has at least one output marked correctly.

Thank you.

I am convert pytorch model to onnx using torch.onnx.export. Below is part of the export code.


....
model = build_model(**model_kwargs(cfg, num_classes=0))
load_pretrained_weights(model, cfg.model.load_weights)
model.eval()

input_blob = torch.rand(16, 3, 256, 128)

input_names = ['input']
output_names = ['output']
dynamic_axes = {}

output_file_path = "model.onnx"

with torch.no_grad():
    torch.onnx.export(
            model,
            input_blob,
            output_file_path,
            export_params=True,
            input_names=input_names,
            output_names=output_names,
            dynamic_axes=dynamic_axes,
            opset_version=9,
            operator_export_type=torch.onnx.OperatorExportTypes.ONNX
        )

I checked the model file at the link in the first post. Here is my screenshot. It reads fine. I ran trtexec in a container nvcr.io/nvidia/tensorrt:22.05-py3

Selection_001

Could you please generate the ONNX model with the latest opset version less than or equal to opset version 17.
Also, we recommend you to please try on the latest TensorRT version 8.4 GA if you still face this issue, please share with us the latest model.
https://developer.nvidia.com/nvidia-tensorrt-8x-download

Thank you.

I cannot use the latest version of TensorRT. I use Deepstream docker images to run the application. The latest version of the deepstream docker image contains a version of TensorRT 8.2.5.1, on which I faced the problem.

I cannot generate ONNX using opset version 17. The torch don’t yet support opset 17 (issue). I generated ONNX model using opset 16 but I faced with the same issue. At a certain point, all the RAM runs out (I have 32 GB) and the process dies.


model_opset16.onnx (8.9 MB)
Uploading: logs_opset16.txt…

Hi,

We could not reproduce the same issue on the latest TensorRT version 8.4 GA.
TensorRT NGC container with v8.4 may release soon.
Or you can try upgrading the TRT version in the container using docs shared previously.

Thank you.