Description
Polygraphy crashes when trying to mark all TensorRT nodes as output.
When removing the --trt-outputs mark all
from command line, it works.
Applying polygraphy sanitizing command didn’t helped, nor onnx-simplifier
.
Environment
TensorRT Version : 8.2 (preview)
NVIDIA GPU : 3090 RTX
NVIDIA Driver Version : 495.29.05
CUDA Version : 11.5
CUDNN Version : 8.3.0.98
Operating System : Linux Ubuntu 21.04
Python Version (if applicable) : 3.9
PyTorch Version (if applicable) : 1.10
Baremetal or Container (if so, version) : Baremetal
Relevant Files
https://drive.google.com/file/d/14wiCeBPTGtWRFdr8Z7-AVtlpCciHojxw/view?usp=sharing
Logs :
/mnt/workspace/fast_transformer$ polygraphy run triton_models/model-original.onnx --trt --onnxrt --fp16 --seed 123 --val-range input_ids:[0,1000] attention_mask:[1,1] token_type_ids:[1,1] --input-shapes input_ids:[1,16] attention_mask:[1,16] token_type_ids:[1,16] --workspace=12G --validate --warm-up 200 --iterations 1 --atol 1e-1 --onnx-outputs mark all --trt-outputs mark all --verbose
[V] Loaded Module: polygraphy.util | Path: ['/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/polygraphy/util']
[V] Model: triton_models/model-original.onnx
[V] Loaded Module: polygraphy | Version: 0.33.0 | Path: ['/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/polygraphy']
[V] Loaded Module: tensorrt | Version: 8.2.0.6 | Path: ['/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/tensorrt']
[I] Will generate inference input data according to provided TensorMetadata: {input_ids [shape=(1, 16)],
attention_mask [shape=(1, 16)],
token_type_ids [shape=(1, 16)]}
[I] trt-runner-N0-11/21/21-21:48:35 | Activating and starting inference
[11/21/2021-21:48:35] [TRT] [I] [MemUsageChange] Init CUDA: CPU +445, GPU +0, now: CPU 460, GPU 836 (MiB)
[11/21/2021-21:48:36] [TRT] [I] ----------------------------------------------------------------
[11/21/2021-21:48:36] [TRT] [I] Input filename: /mnt/workspace/fast_transformer/triton_models/model-original.onnx
[11/21/2021-21:48:36] [TRT] [I] ONNX IR version: 0.0.7
[11/21/2021-21:48:36] [TRT] [I] Opset version: 12
[11/21/2021-21:48:36] [TRT] [I] Producer name: pytorch
[11/21/2021-21:48:36] [TRT] [I] Producer version: 1.10
[11/21/2021-21:48:36] [TRT] [I] Domain:
[11/21/2021-21:48:36] [TRT] [I] Model version: 0
[11/21/2021-21:48:36] [TRT] [I] Doc string:
[11/21/2021-21:48:36] [TRT] [I] ----------------------------------------------------------------
[11/21/2021-21:48:36] [TRT] [W] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/21/2021-21:48:37] [TRT] [W] Output type must be INT32 for shape outputs
[11/21/2021-21:48:37] [TRT] [W] Output type must be INT32 for shape outputs
[11/21/2021-21:48:37] [TRT] [W] Output type must be INT32 for shape outputs
[11/21/2021-21:48:37] [TRT] [W] Output type must be INT32 for shape outputs
[V] Marking 677 tensors as outputs
[V] Setting TensorRT Optimization Profiles
[V] Input tensor: input_ids (dtype=DataType.INT32, shape=(-1, -1)) | Setting input tensor shapes to: (min=[1, 16], opt=[1, 16], max=[1, 16])
[V] Input tensor: token_type_ids (dtype=DataType.INT32, shape=(-1, -1)) | Setting input tensor shapes to: (min=[1, 16], opt=[1, 16], max=[1, 16])
[V] Input tensor: attention_mask (dtype=DataType.INT32, shape=(-1, -1)) | Setting input tensor shapes to: (min=[1, 16], opt=[1, 16], max=[1, 16])
[I] Configuring with profiles: [Profile().add(input_ids, min=[1, 16], opt=[1, 16], max=[1, 16]).add(attention_mask, min=[1, 16], opt=[1, 16], max=[1, 16]).add(token_type_ids, min=[1, 16], opt=[1, 16], max=[1, 16])]
[I] Building engine with configuration:
Workspace | 12884901888 bytes (12288.00 MiB)
Precision | TF32: False, FP16: True, INT8: False, Strict Types: False
Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN']
Safety Restricted | False
Profiles | 1 profile(s)
[11/21/2021-21:48:37] [TRT] [I] [MemUsageSnapshot] Builder begin: CPU 775 MiB, GPU 912 MiB
[11/21/2021-21:48:38] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +805, GPU +350, now: CPU 1581, GPU 1262 (MiB)
[11/21/2021-21:48:38] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1706, GPU 1320 (MiB)
[11/21/2021-21:48:38] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[11/21/2021-21:48:38] [TRT] [E] 2: [optimizer.cpp::getFormatRequirements::3815] Error Code 2: Internal Error (Assertion !n->candidateRequirements.empty() failed. no supported formats)
[11/21/2021-21:48:38] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::561] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
Steps To Reproduce
polygraphy run triton_models/model-original.onnx --trt --onnxrt \
--fp16 --seed 123 \
--val-range input_ids:[0,1000] attention_mask:[1,1] token_type_ids:[1,1] \
--input-shapes input_ids:[1,16] attention_mask:[1,16] token_type_ids:[1,16] \
--workspace=12G \
--validate \
--warm-up 200 \
--iterations 1 \
--atol 1e-1 \
--onnx-outputs mark all \
--trt-outputs mark all --verbose