Help converting a pytorch model to TensorRT

I am trying to convert a pytorch model used for SiamRPN tracking for use on the Xavier NX and have been having significant trouble.

The project github is here: https://github.com/STVIR/pysot, and the model is located here:
https://drive.google.com/drive/folders/1t62x56Jl7baUzPTo0QrC4jJnwvPZm-2m

When I try the following, I get the error shown below.
Thanks very much for the guidance.

I tried converting to ONNX and got the following error:

$ /usr/src/tensorrt/bin/trtexec --onnx=model.pth
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
[07/01/2020-12:43:58] [I] === Model Options ===
[07/01/2020-12:43:58] [I] Format: ONNX
[07/01/2020-12:43:58] [I] Model: model.pth
[07/01/2020-12:43:58] [I] Output:
[07/01/2020-12:43:58] [I] === Build Options ===
[07/01/2020-12:43:58] [I] Max batch: 1
[07/01/2020-12:43:58] [I] Workspace: 16 MB
[07/01/2020-12:43:58] [I] minTiming: 1
[07/01/2020-12:43:58] [I] avgTiming: 8
[07/01/2020-12:43:58] [I] Precision: FP32
[07/01/2020-12:43:58] [I] Calibration: 
[07/01/2020-12:43:58] [I] Safe mode: Disabled
[07/01/2020-12:43:58] [I] Save engine: 
[07/01/2020-12:43:58] [I] Load engine: 
[07/01/2020-12:43:58] [I] Builder Cache: Enabled
[07/01/2020-12:43:58] [I] NVTX verbosity: 0
[07/01/2020-12:43:58] [I] Inputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Outputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Input build shapes: model
[07/01/2020-12:43:58] [I] Input calibration shapes: model
[07/01/2020-12:43:58] [I] === System Options ===
[07/01/2020-12:43:58] [I] Device: 0
[07/01/2020-12:43:58] [I] DLACore: 
[07/01/2020-12:43:58] [I] Plugins:
[07/01/2020-12:43:58] [I] === Inference Options ===
[07/01/2020-12:43:58] [I] Batch: 1
[07/01/2020-12:43:58] [I] Input inference shapes: model
[07/01/2020-12:43:58] [I] Iterations: 10
[07/01/2020-12:43:58] [I] Duration: 3s (+ 200ms warm up)
[07/01/2020-12:43:58] [I] Sleep time: 0ms
[07/01/2020-12:43:58] [I] Streams: 1
[07/01/2020-12:43:58] [I] ExposeDMA: Disabled
[07/01/2020-12:43:58] [I] Spin-wait: Disabled
[07/01/2020-12:43:58] [I] Multithreading: Disabled
[07/01/2020-12:43:58] [I] CUDA Graph: Disabled
[07/01/2020-12:43:58] [I] Skip inference: Disabled
[07/01/2020-12:43:58] [I] Inputs:
[07/01/2020-12:43:58] [I] === Reporting Options ===
[07/01/2020-12:43:58] [I] Verbose: Disabled
[07/01/2020-12:43:58] [I] Averages: 10 inferences
[07/01/2020-12:43:58] [I] Percentile: 99
[07/01/2020-12:43:58] [I] Dump output: Disabled
[07/01/2020-12:43:58] [I] Profile: Disabled
[07/01/2020-12:43:58] [I] Export timing to JSON file: 
[07/01/2020-12:43:58] [I] Export output to JSON file: 
[07/01/2020-12:43:58] [I] Export profile to JSON file: 
[07/01/2020-12:43:58] [I] 
----------------------------------------------------------------
Input filename:   model.pth
ONNX IR version:  0.0.0
Opset version:    0
Producer name:    
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/01/2020-12:44:00] [E] [TRT] Network must have at least one output
[07/01/2020-12:44:00] [E] [TRT] Network validation failed.
[07/01/2020-12:44:00] [E] Engine creation failed
[07/01/2020-12:44:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
1 Like

Hi,

There are some misunderstanding here.

Please noticed that our trtexec is used for converting onnx model into TensorRT engine.
The output the onnx format from pyTorch instead .pth file first.
https://pytorch.org/docs/master/onnx.html

Thanks.

Julie Bareeva at Learn OpenCV just released a mini tutorial on converting a pytorch model to trt


I will try following that and post here with the results.

I tried running with the basic example presented on the site you linked to first:
https://pytorch.org/docs/master/onnx.html

and created an onnx model as they show using the code

import torch
import torchvision

dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
model = torchvision.models.alexnet(pretrained=True).cuda()

# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]

torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)

This created the file “alexnet.onnx”.

I than ran the command
/usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx`

and got the result 
        &&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx
        [07/11/2020-20:12:25] [I] === Model Options ===
        [07/11/2020-20:12:25] [I] Format: ONNX
        [07/11/2020-20:12:25] [I] Model: alexnet.onnx
        [07/11/2020-20:12:25] [I] Output:
        [07/11/2020-20:12:25] [I] === Build Options ===
        [07/11/2020-20:12:25] [I] Max batch: 1
        [07/11/2020-20:12:25] [I] Workspace: 16 MB
        [07/11/2020-20:12:25] [I] minTiming: 1
        [07/11/2020-20:12:25] [I] avgTiming: 8
        [07/11/2020-20:12:25] [I] Precision: FP32
        [07/11/2020-20:12:25] [I] Calibration: 
        [07/11/2020-20:12:25] [I] Safe mode: Disabled
        [07/11/2020-20:12:25] [I] Save engine: 
        [07/11/2020-20:12:25] [I] Load engine: 
        [07/11/2020-20:12:25] [I] Builder Cache: Enabled
        [07/11/2020-20:12:25] [I] NVTX verbosity: 0
        [07/11/2020-20:12:25] [I] Inputs format: fp32:CHW
        [07/11/2020-20:12:25] [I] Outputs format: fp32:CHW
        [07/11/2020-20:12:25] [I] Input build shapes: model
        [07/11/2020-20:12:25] [I] Input calibration shapes: model
        [07/11/2020-20:12:25] [I] === System Options ===
        [07/11/2020-20:12:25] [I] Device: 0
        [07/11/2020-20:12:25] [I] DLACore: 
        [07/11/2020-20:12:25] [I] Plugins:
        [07/11/2020-20:12:25] [I] === Inference Options ===
        [07/11/2020-20:12:25] [I] Batch: 1
        [07/11/2020-20:12:25] [I] Input inference shapes: model
        [07/11/2020-20:12:25] [I] Iterations: 10
        [07/11/2020-20:12:25] [I] Duration: 3s (+ 200ms warm up)
        [07/11/2020-20:12:25] [I] Sleep time: 0ms
        [07/11/2020-20:12:25] [I] Streams: 1
        [07/11/2020-20:12:25] [I] ExposeDMA: Disabled
        [07/11/2020-20:12:25] [I] Spin-wait: Disabled
        [07/11/2020-20:12:25] [I] Multithreading: Disabled
        [07/11/2020-20:12:25] [I] CUDA Graph: Disabled
        [07/11/2020-20:12:25] [I] Skip inference: Disabled
        [07/11/2020-20:12:25] [I] Inputs:
        [07/11/2020-20:12:25] [I] === Reporting Options ===
        [07/11/2020-20:12:25] [I] Verbose: Disabled
        [07/11/2020-20:12:25] [I] Averages: 10 inferences
        [07/11/2020-20:12:25] [I] Percentile: 99
        [07/11/2020-20:12:25] [I] Dump output: Disabled
        [07/11/2020-20:12:25] [I] Profile: Disabled
        [07/11/2020-20:12:25] [I] Export timing to JSON file: 
        [07/11/2020-20:12:25] [I] Export output to JSON file: 
        [07/11/2020-20:12:25] [I] Export profile to JSON file: 
        [07/11/2020-20:12:25] [I] 
        ----------------------------------------------------------------
        Input filename:   alexnet.onnx
        ONNX IR version:  0.0.4
        Opset version:    9
        Producer name:    pytorch
        Producer version: 1.3
        Domain:           
        Model version:    0
        Doc string:       
        ----------------------------------------------------------------
        [07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
        [07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
        [07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
        [07/11/2020-20:12:31] [I] [TRT] 
        [07/11/2020-20:12:31] [I] [TRT] --------------- Layers running on DLA: 
        [07/11/2020-20:12:31] [I] [TRT] 
        [07/11/2020-20:12:31] [I] [TRT] --------------- Layers running on GPU: 
        [07/11/2020-20:12:31] [I] [TRT] (Unnamed Layer* 0) [Convolution] + (Unnamed Layer* 1) [Activation], (Unnamed Layer* 2) [Pooling], (Unnamed Layer* 3) [Convolution] + (Unnamed Layer* 4) [Activation], (Unnamed Layer* 5) [Pooling], (Unnamed Layer* 6) [Convolution] + (Unnamed Layer* 7) [Activation], (Unnamed Layer* 8) [Convolution] + (Unnamed Layer* 9) [Activation], (Unnamed Layer* 10) [Convolution] + (Unnamed Layer* 11) [Activation], (Unnamed Layer* 12) [Pooling], (Unnamed Layer* 13) [Pooling], (Unnamed Layer* 14) [Shuffle], (Unnamed Layer* 16) [Constant], (Unnamed Layer* 17) [Matrix Multiply], (Unnamed Layer* 18) [Constant] + (Unnamed Layer* 19) [Shuffle], (Unnamed Layer* 20) [ElementWise] + (Unnamed Layer* 21) [Activation], (Unnamed Layer* 23) [Constant], (Unnamed Layer* 24) [Matrix Multiply], (Unnamed Layer* 25) [Constant] + (Unnamed Layer* 26) [Shuffle], (Unnamed Layer* 27) [ElementWise] + (Unnamed Layer* 28) [Activation], (Unnamed Layer* 30) [Constant], (Unnamed Layer* 31) [Matrix Multiply], (Unnamed Layer* 32) [Constant] + (Unnamed Layer* 33) [Shuffle], (Unnamed Layer* 34) [ElementWise], 
        [07/11/2020-20:12:37] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
        [07/11/2020-20:12:45] [I] [TRT] Detected 1 inputs and 1 output network tensors.
        [07/11/2020-20:12:46] [I] Starting inference threads
        [07/11/2020-20:12:49] [I] Warmup completed 8 queries over 200 ms
        [07/11/2020-20:12:49] [I] Timing trace has 119 queries over 3.06582 s
        [07/11/2020-20:12:49] [I] Trace averages of 10 runs:
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 24.3995 ms - Host latency: 24.6575 ms (end to end 24.6674 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 24.6877 ms - Host latency: 24.9496 ms (end to end 24.997 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.0552 ms - Host latency: 25.3201 ms (end to end 25.3291 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.6317 ms - Host latency: 25.8955 ms (end to end 25.9071 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.8685 ms - Host latency: 26.1265 ms (end to end 26.1363 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.6341 ms - Host latency: 25.8938 ms (end to end 25.9044 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 26.0824 ms - Host latency: 26.3414 ms (end to end 26.3499 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.715 ms - Host latency: 25.9726 ms (end to end 25.9819 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7232 ms - Host latency: 25.984 ms (end to end 26.0108 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7114 ms - Host latency: 25.9733 ms (end to end 25.9825 ms)
        [07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7021 ms - Host latency: 25.9606 ms (end to end 25.9693 ms)
        [07/11/2020-20:12:49] [I] Host latency
        [07/11/2020-20:12:49] [I] min: 24.4 ms (end to end 24.4104 ms)
        [07/11/2020-20:12:49] [I] max: 28.1171 ms (end to end 28.1255 ms)
        [07/11/2020-20:12:49] [I] mean: 25.7483 ms (end to end 25.7624 ms)
        [07/11/2020-20:12:49] [I] median: 25.8259 ms (end to end 25.8373 ms)
        [07/11/2020-20:12:49] [I] percentile: 27.9119 ms at 99% (end to end 27.9301 ms at 99%)
        [07/11/2020-20:12:49] [I] throughput: 38.8151 qps
        [07/11/2020-20:12:49] [I] walltime: 3.06582 s
        [07/11/2020-20:12:49] [I] GPU Compute
        [07/11/2020-20:12:49] [I] min: 24.1408 ms
        [07/11/2020-20:12:49] [I] max: 27.8528 ms
        [07/11/2020-20:12:49] [I] mean: 25.4878 ms
        [07/11/2020-20:12:49] [I] median: 25.5653 ms
        [07/11/2020-20:12:49] [I] percentile: 27.6552 ms at 99%
        [07/11/2020-20:12:49] [I] total compute time: 3.03305 s
        &&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx

Some questions:

  1. It looks like it succesfully created a tensorRT model. Where does that model live now? I don’t see it.

  2. I am not sure completely how to convert my model now, it is not like the simple example shown here.

Thanks!

The models I am looking to convert are located here:

Hi,

Sorry for the late update.

The output TensorRT engine is by default disabled.
If you want to get the serialized engine file, please add --saveEngine=<file> when running trtexec.

$ /usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx --saveEngine=alexnet.trt

For your customized model, please generate the onnx based model from pyTorch first.
Then you can inference it with TensorRT as the command above.

Thanks.