I tried running with the basic example presented on the site you linked to first:
https://pytorch.org/docs/master/onnx.html
and created an onnx model as they show using the code
import torch
import torchvision
dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
model = torchvision.models.alexnet(pretrained=True).cuda()
# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]
torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
This created the file “alexnet.onnx”.
I than ran the command
/usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx`
and got the result
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx
[07/11/2020-20:12:25] [I] === Model Options ===
[07/11/2020-20:12:25] [I] Format: ONNX
[07/11/2020-20:12:25] [I] Model: alexnet.onnx
[07/11/2020-20:12:25] [I] Output:
[07/11/2020-20:12:25] [I] === Build Options ===
[07/11/2020-20:12:25] [I] Max batch: 1
[07/11/2020-20:12:25] [I] Workspace: 16 MB
[07/11/2020-20:12:25] [I] minTiming: 1
[07/11/2020-20:12:25] [I] avgTiming: 8
[07/11/2020-20:12:25] [I] Precision: FP32
[07/11/2020-20:12:25] [I] Calibration:
[07/11/2020-20:12:25] [I] Safe mode: Disabled
[07/11/2020-20:12:25] [I] Save engine:
[07/11/2020-20:12:25] [I] Load engine:
[07/11/2020-20:12:25] [I] Builder Cache: Enabled
[07/11/2020-20:12:25] [I] NVTX verbosity: 0
[07/11/2020-20:12:25] [I] Inputs format: fp32:CHW
[07/11/2020-20:12:25] [I] Outputs format: fp32:CHW
[07/11/2020-20:12:25] [I] Input build shapes: model
[07/11/2020-20:12:25] [I] Input calibration shapes: model
[07/11/2020-20:12:25] [I] === System Options ===
[07/11/2020-20:12:25] [I] Device: 0
[07/11/2020-20:12:25] [I] DLACore:
[07/11/2020-20:12:25] [I] Plugins:
[07/11/2020-20:12:25] [I] === Inference Options ===
[07/11/2020-20:12:25] [I] Batch: 1
[07/11/2020-20:12:25] [I] Input inference shapes: model
[07/11/2020-20:12:25] [I] Iterations: 10
[07/11/2020-20:12:25] [I] Duration: 3s (+ 200ms warm up)
[07/11/2020-20:12:25] [I] Sleep time: 0ms
[07/11/2020-20:12:25] [I] Streams: 1
[07/11/2020-20:12:25] [I] ExposeDMA: Disabled
[07/11/2020-20:12:25] [I] Spin-wait: Disabled
[07/11/2020-20:12:25] [I] Multithreading: Disabled
[07/11/2020-20:12:25] [I] CUDA Graph: Disabled
[07/11/2020-20:12:25] [I] Skip inference: Disabled
[07/11/2020-20:12:25] [I] Inputs:
[07/11/2020-20:12:25] [I] === Reporting Options ===
[07/11/2020-20:12:25] [I] Verbose: Disabled
[07/11/2020-20:12:25] [I] Averages: 10 inferences
[07/11/2020-20:12:25] [I] Percentile: 99
[07/11/2020-20:12:25] [I] Dump output: Disabled
[07/11/2020-20:12:25] [I] Profile: Disabled
[07/11/2020-20:12:25] [I] Export timing to JSON file:
[07/11/2020-20:12:25] [I] Export output to JSON file:
[07/11/2020-20:12:25] [I] Export profile to JSON file:
[07/11/2020-20:12:25] [I]
----------------------------------------------------------------
Input filename: alexnet.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[07/11/2020-20:12:31] [W] [TRT] Calling isShapeTensor before the entire network is constructed may result in an inaccurate result.
[07/11/2020-20:12:31] [I] [TRT]
[07/11/2020-20:12:31] [I] [TRT] --------------- Layers running on DLA:
[07/11/2020-20:12:31] [I] [TRT]
[07/11/2020-20:12:31] [I] [TRT] --------------- Layers running on GPU:
[07/11/2020-20:12:31] [I] [TRT] (Unnamed Layer* 0) [Convolution] + (Unnamed Layer* 1) [Activation], (Unnamed Layer* 2) [Pooling], (Unnamed Layer* 3) [Convolution] + (Unnamed Layer* 4) [Activation], (Unnamed Layer* 5) [Pooling], (Unnamed Layer* 6) [Convolution] + (Unnamed Layer* 7) [Activation], (Unnamed Layer* 8) [Convolution] + (Unnamed Layer* 9) [Activation], (Unnamed Layer* 10) [Convolution] + (Unnamed Layer* 11) [Activation], (Unnamed Layer* 12) [Pooling], (Unnamed Layer* 13) [Pooling], (Unnamed Layer* 14) [Shuffle], (Unnamed Layer* 16) [Constant], (Unnamed Layer* 17) [Matrix Multiply], (Unnamed Layer* 18) [Constant] + (Unnamed Layer* 19) [Shuffle], (Unnamed Layer* 20) [ElementWise] + (Unnamed Layer* 21) [Activation], (Unnamed Layer* 23) [Constant], (Unnamed Layer* 24) [Matrix Multiply], (Unnamed Layer* 25) [Constant] + (Unnamed Layer* 26) [Shuffle], (Unnamed Layer* 27) [ElementWise] + (Unnamed Layer* 28) [Activation], (Unnamed Layer* 30) [Constant], (Unnamed Layer* 31) [Matrix Multiply], (Unnamed Layer* 32) [Constant] + (Unnamed Layer* 33) [Shuffle], (Unnamed Layer* 34) [ElementWise],
[07/11/2020-20:12:37] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[07/11/2020-20:12:45] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[07/11/2020-20:12:46] [I] Starting inference threads
[07/11/2020-20:12:49] [I] Warmup completed 8 queries over 200 ms
[07/11/2020-20:12:49] [I] Timing trace has 119 queries over 3.06582 s
[07/11/2020-20:12:49] [I] Trace averages of 10 runs:
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 24.3995 ms - Host latency: 24.6575 ms (end to end 24.6674 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 24.6877 ms - Host latency: 24.9496 ms (end to end 24.997 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.0552 ms - Host latency: 25.3201 ms (end to end 25.3291 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.6317 ms - Host latency: 25.8955 ms (end to end 25.9071 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.8685 ms - Host latency: 26.1265 ms (end to end 26.1363 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.6341 ms - Host latency: 25.8938 ms (end to end 25.9044 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 26.0824 ms - Host latency: 26.3414 ms (end to end 26.3499 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.715 ms - Host latency: 25.9726 ms (end to end 25.9819 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7232 ms - Host latency: 25.984 ms (end to end 26.0108 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7114 ms - Host latency: 25.9733 ms (end to end 25.9825 ms)
[07/11/2020-20:12:49] [I] Average on 10 runs - GPU latency: 25.7021 ms - Host latency: 25.9606 ms (end to end 25.9693 ms)
[07/11/2020-20:12:49] [I] Host latency
[07/11/2020-20:12:49] [I] min: 24.4 ms (end to end 24.4104 ms)
[07/11/2020-20:12:49] [I] max: 28.1171 ms (end to end 28.1255 ms)
[07/11/2020-20:12:49] [I] mean: 25.7483 ms (end to end 25.7624 ms)
[07/11/2020-20:12:49] [I] median: 25.8259 ms (end to end 25.8373 ms)
[07/11/2020-20:12:49] [I] percentile: 27.9119 ms at 99% (end to end 27.9301 ms at 99%)
[07/11/2020-20:12:49] [I] throughput: 38.8151 qps
[07/11/2020-20:12:49] [I] walltime: 3.06582 s
[07/11/2020-20:12:49] [I] GPU Compute
[07/11/2020-20:12:49] [I] min: 24.1408 ms
[07/11/2020-20:12:49] [I] max: 27.8528 ms
[07/11/2020-20:12:49] [I] mean: 25.4878 ms
[07/11/2020-20:12:49] [I] median: 25.5653 ms
[07/11/2020-20:12:49] [I] percentile: 27.6552 ms at 99%
[07/11/2020-20:12:49] [I] total compute time: 3.03305 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=alexnet.onnx
Some questions:
-
It looks like it succesfully created a tensorRT model. Where does that model live now? I don’t see it.
-
I am not sure completely how to convert my model now, it is not like the simple example shown here.
Thanks!