BUG: Output TRT engine from trtexec has completely different inference than input model

Description

I am trying to convert a model from torch-1.9 → ONNX → trt engine.

This all happens without issue, but when running inference on the TRT engine the result is completely different than expected.

I have verified that running inference on the ONNX model is the same as the torch model, so the issue has to be with the torch conversion.
I am using trtexec to convert with the command
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --workspace=1024

There are no error in the generation of the engine or in the inference.
How do I figure out where this is going wrong?

Environment

Running on Jetson Xavier with jetpack 4.6.
Cuda 10.2
tensorrt 8.0.1.6
onnx 1.8.1

Relevant Files

model.onnx (2.7 KB)

Steps To Reproduce

Torch to onnx command used is:

traced_model =  torch.jit.trace(model, train_tensor)

torch.onnx.export(traced_model, train_tensor, ONNX_FILE_PATH, example_outputs=test_tensor, verbose=False,export_params=True)

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hello,
I have shared the model and trtexec command in the original post above.
onnx.checker outputs no errors.

The output of the verbose trtexec command is provided below.

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --workspace=1024 --verbose
[11/08/2021-09:08:57] [I] === Model Options ===
[11/08/2021-09:08:57] [I] Format: ONNX
[11/08/2021-09:08:57] [I] Model: model.onnx
[11/08/2021-09:08:57] [I] Output:
[11/08/2021-09:08:57] [I] === Build Options ===
[11/08/2021-09:08:57] [I] Max batch: explicit
[11/08/2021-09:08:57] [I] Workspace: 1024 MiB
[11/08/2021-09:08:57] [I] minTiming: 1
[11/08/2021-09:08:57] [I] avgTiming: 8
[11/08/2021-09:08:57] [I] Precision: FP32
[11/08/2021-09:08:57] [I] Calibration: 
[11/08/2021-09:08:57] [I] Refit: Disabled
[11/08/2021-09:08:57] [I] Sparsity: Disabled
[11/08/2021-09:08:57] [I] Safe mode: Disabled
[11/08/2021-09:08:57] [I] Restricted mode: Disabled
[11/08/2021-09:08:57] [I] Save engine: model.trt
[11/08/2021-09:08:57] [I] Load engine: 
[11/08/2021-09:08:57] [I] NVTX verbosity: 0
[11/08/2021-09:08:57] [I] Tactic sources: Using default tactic sources
[11/08/2021-09:08:57] [I] timingCacheMode: local
[11/08/2021-09:08:57] [I] timingCacheFile: 
[11/08/2021-09:08:57] [I] Input(s)s format: fp32:CHW
[11/08/2021-09:08:57] [I] Output(s)s format: fp32:CHW
[11/08/2021-09:08:57] [I] Input build shapes: model
[11/08/2021-09:08:57] [I] Input calibration shapes: model
[11/08/2021-09:08:57] [I] === System Options ===
[11/08/2021-09:08:57] [I] Device: 0
[11/08/2021-09:08:57] [I] DLACore: 
[11/08/2021-09:08:57] [I] Plugins:
[11/08/2021-09:08:57] [I] === Inference Options ===
[11/08/2021-09:08:57] [I] Batch: Explicit
[11/08/2021-09:08:57] [I] Input inference shapes: model
[11/08/2021-09:08:57] [I] Iterations: 10
[11/08/2021-09:08:57] [I] Duration: 3s (+ 200ms warm up)
[11/08/2021-09:08:57] [I] Sleep time: 0ms
[11/08/2021-09:08:57] [I] Streams: 1
[11/08/2021-09:08:57] [I] ExposeDMA: Disabled
[11/08/2021-09:08:57] [I] Data transfers: Enabled
[11/08/2021-09:08:57] [I] Spin-wait: Disabled
[11/08/2021-09:08:57] [I] Multithreading: Disabled
[11/08/2021-09:08:57] [I] CUDA Graph: Disabled
[11/08/2021-09:08:57] [I] Separate profiling: Disabled
[11/08/2021-09:08:57] [I] Time Deserialize: Disabled
[11/08/2021-09:08:57] [I] Time Refit: Disabled
[11/08/2021-09:08:57] [I] Skip inference: Disabled
[11/08/2021-09:08:57] [I] Inputs:
[11/08/2021-09:08:57] [I] === Reporting Options ===
[11/08/2021-09:08:57] [I] Verbose: Enabled
[11/08/2021-09:08:57] [I] Averages: 10 inferences
[11/08/2021-09:08:57] [I] Percentile: 99
[11/08/2021-09:08:57] [I] Dump refittable layers:Disabled
[11/08/2021-09:08:57] [I] Dump output: Disabled
[11/08/2021-09:08:57] [I] Profile: Disabled
[11/08/2021-09:08:57] [I] Export timing to JSON file: 
[11/08/2021-09:08:57] [I] Export output to JSON file: 
[11/08/2021-09:08:57] [I] Export profile to JSON file: 
[11/08/2021-09:08:57] [I] 
[11/08/2021-09:08:57] [I] === Device Information ===
[11/08/2021-09:08:57] [I] Selected Device: Xavier
[11/08/2021-09:08:57] [I] Compute Capability: 7.2
[11/08/2021-09:08:57] [I] SMs: 6
[11/08/2021-09:08:57] [I] Compute Clock Rate: 1.109 GHz
[11/08/2021-09:08:57] [I] Device Global Memory: 7773 MiB
[11/08/2021-09:08:57] [I] Shared Memory per SM: 96 KiB
[11/08/2021-09:08:57] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/08/2021-09:08:57] [I] Memory Clock Rate: 1.109 GHz
[11/08/2021-09:08:57] [I] 
[11/08/2021-09:08:57] [I] TensorRT version: 8001
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::NMS_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Reorg_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Region_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Clip_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::LReLU_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::PriorBox_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Normalize_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::ScatterND version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::RPROI_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::FlattenConcat_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::CropAndResize version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Proposal version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::Split version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[11/08/2021-09:08:57] [V] [TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[11/08/2021-09:08:59] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 7418 (MiB)
[11/08/2021-09:08:59] [I] Start parsing network model
[11/08/2021-09:08:59] [I] [TRT] ----------------------------------------------------------------
[11/08/2021-09:08:59] [I] [TRT] Input filename:   model.onnx
[11/08/2021-09:08:59] [I] [TRT] ONNX IR version:  0.0.6
[11/08/2021-09:08:59] [I] [TRT] Opset version:    9
[11/08/2021-09:08:59] [I] [TRT] Producer name:    pytorch
[11/08/2021-09:08:59] [I] [TRT] Producer version: 1.9
[11/08/2021-09:08:59] [I] [TRT] Domain:           
[11/08/2021-09:08:59] [I] [TRT] Model version:    0
[11/08/2021-09:08:59] [I] [TRT] Doc string:       
[11/08/2021-09:08:59] [I] [TRT] ----------------------------------------------------------------
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::GridAnchor_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::GridAnchorRect_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::NMS_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Reorg_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Region_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Clip_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::LReLU_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::PriorBox_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Normalize_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::ScatterND version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::RPROI_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::BatchedNMS_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::BatchedNMSDynamic_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::FlattenConcat_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::CropAndResize version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::DetectionLayer_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::EfficientNMS_ONNX_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::EfficientNMS_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Proposal version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::ProposalLayer_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::PyramidROIAlign_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::ResizeNearest_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::Split version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::SpecialSlice_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Plugin creator already registered - ::InstanceNormalization_TRT version 1
[11/08/2021-09:08:59] [V] [TRT] Adding network input: x with dtype: float32, dimensions: (2, 16, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: x for ONNX tensor: x
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: fc.bias
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: fc.weight
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 108
[11/08/2021-09:08:59] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 109
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 127
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 128
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 129
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 147
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 148
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 149
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 167
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 168
[11/08/2021-09:08:59] [V] [TRT] Importing initializer: 169
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Shape_0 [Shape]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: x
[11/08/2021-09:08:59] [V] [TRT] Shape_0 [Shape] inputs: [x -> (2, 16, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Shape_0 for ONNX node: Shape_0
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 17 for ONNX tensor: 17
[11/08/2021-09:08:59] [V] [TRT] Shape_0 [Shape] outputs: [17 -> (3)[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Constant_1 [Constant]
[11/08/2021-09:08:59] [V] [TRT] Constant_1 [Constant] inputs: 
[11/08/2021-09:08:59] [V] [TRT] Constant_1 [Constant] outputs: [18 -> ()[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Gather_2 [Gather]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 17
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 18
[11/08/2021-09:08:59] [V] [TRT] Gather_2 [Gather] inputs: [17 -> (3)[INT32]], [18 -> ()[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 18 for ONNX node: 18
[11/08/2021-09:08:59] [V] [TRT] Using Gather axis: 0
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Gather_2 for ONNX node: Gather_2
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 19 for ONNX tensor: 19
[11/08/2021-09:08:59] [V] [TRT] Gather_2 [Gather] outputs: [19 -> ()[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Unsqueeze_3 [Unsqueeze]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 19
[11/08/2021-09:08:59] [V] [TRT] Unsqueeze_3 [Unsqueeze] inputs: [19 -> ()[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Original shape: (), unsqueezing to: (1,)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Unsqueeze_3 for ONNX node: Unsqueeze_3
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 21 for ONNX tensor: 21
[11/08/2021-09:08:59] [V] [TRT] Unsqueeze_3 [Unsqueeze] outputs: [21 -> (1)[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Concat_4 [Concat]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 108
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 21
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 109
[11/08/2021-09:08:59] [V] [TRT] Concat_4 [Concat] inputs: [108 -> (1)[INT32]], [21 -> (1)[INT32]], [109 -> (1)[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 108 for ONNX node: 108
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 109 for ONNX node: 109
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Concat_4 for ONNX node: Concat_4
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 23 for ONNX tensor: 23
[11/08/2021-09:08:59] [V] [TRT] Concat_4 [Concat] outputs: [23 -> (3)[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: ConstantOfShape_5 [ConstantOfShape]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 23
[11/08/2021-09:08:59] [V] [TRT] ConstantOfShape_5 [ConstantOfShape] inputs: [23 -> (3)[INT32]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: ConstantOfShape_5 for ONNX node: ConstantOfShape_5
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 24 for ONNX tensor: 24
[11/08/2021-09:08:59] [V] [TRT] ConstantOfShape_5 [ConstantOfShape] outputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Transpose_6 [Transpose]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: x
[11/08/2021-09:08:59] [V] [TRT] Transpose_6 [Transpose] inputs: [x -> (2, 16, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Transpose_6 for ONNX node: Transpose_6
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 25 for ONNX tensor: 25
[11/08/2021-09:08:59] [V] [TRT] Transpose_6 [Transpose] outputs: [25 -> (16, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_7 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_7 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_7 for ONNX node: Slice_7
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 47 for ONNX tensor: 47
[11/08/2021-09:08:59] [V] [TRT] Slice_7 [Slice] outputs: [47 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_8 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_8 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_8 for ONNX node: Slice_8
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 48 for ONNX tensor: 48
[11/08/2021-09:08:59] [V] [TRT] Slice_8 [Slice] outputs: [48 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: LSTM_9 [LSTM]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 25
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 127
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 128
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 129
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 47
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 48
[11/08/2021-09:08:59] [V] [TRT] LSTM_9 [LSTM] inputs: [25 -> (16, 2, 1)[FLOAT]], [127 -> (1, 4, 1)[FLOAT]], [128 -> (1, 4, 1)[FLOAT]], [129 -> (1, 8)[FLOAT]], [optional input, not set], [47 -> (1, 2, 1)[FLOAT]], [48 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 127 for ONNX node: 127
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 128 for ONNX node: 128
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 129 for ONNX node: 129
[11/08/2021-09:08:59] [V] [TRT] Bias shape is: (1, 8)
[11/08/2021-09:08:59] [V] [TRT] Reshaping bias to: (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] After reduction, bias shape is: (1, 1, 4)
[11/08/2021-09:08:59] [V] [TRT] numDirectionsTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] hiddenSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] batchSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] Gate output rank (equal to initial hidden/cell state rank): (3)
[11/08/2021-09:08:59] [V] [TRT] Initial hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Initial cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Entering Loop
[11/08/2021-09:08:59] [V] [TRT] Original shape: (2, 1), unsqueezing to: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Input shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: LSTM_9 for ONNX node: LSTM_9
[11/08/2021-09:08:59] [V] [TRT] Hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] X(t) * W^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] H(t-1) * R^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] intermediate(t) -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] c(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] C(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] H(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 49 for ONNX tensor: 49
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 50 for ONNX tensor: 50
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 51 for ONNX tensor: 51
[11/08/2021-09:08:59] [V] [TRT] LSTM_9 [LSTM] outputs: [49 -> (16, 1, 2, 1)[FLOAT]], [50 -> (1, 2, 1)[FLOAT]], [51 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Squeeze_10 [Squeeze]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 49
[11/08/2021-09:08:59] [V] [TRT] Squeeze_10 [Squeeze] inputs: [49 -> (16, 1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Original shape: (16, 1, 2, 1), squeezing to: (16, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Squeeze_10 for ONNX node: Squeeze_10
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 52 for ONNX tensor: 52
[11/08/2021-09:08:59] [V] [TRT] Squeeze_10 [Squeeze] outputs: [52 -> (16, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_11 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_11 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_11 for ONNX node: Slice_11
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 73 for ONNX tensor: 73
[11/08/2021-09:08:59] [V] [TRT] Slice_11 [Slice] outputs: [73 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_12 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_12 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_12 for ONNX node: Slice_12
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 74 for ONNX tensor: 74
[11/08/2021-09:08:59] [V] [TRT] Slice_12 [Slice] outputs: [74 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: LSTM_13 [LSTM]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 52
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 147
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 148
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 149
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 73
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 74
[11/08/2021-09:08:59] [V] [TRT] LSTM_13 [LSTM] inputs: [52 -> (16, 2, 1)[FLOAT]], [147 -> (1, 4, 1)[FLOAT]], [148 -> (1, 4, 1)[FLOAT]], [149 -> (1, 8)[FLOAT]], [optional input, not set], [73 -> (1, 2, 1)[FLOAT]], [74 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 147 for ONNX node: 147
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 148 for ONNX node: 148
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 149 for ONNX node: 149
[11/08/2021-09:08:59] [V] [TRT] Bias shape is: (1, 8)
[11/08/2021-09:08:59] [V] [TRT] Reshaping bias to: (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] After reduction, bias shape is: (1, 1, 4)
[11/08/2021-09:08:59] [V] [TRT] numDirectionsTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] hiddenSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] batchSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] Gate output rank (equal to initial hidden/cell state rank): (3)
[11/08/2021-09:08:59] [V] [TRT] Initial hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Initial cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Entering Loop
[11/08/2021-09:08:59] [V] [TRT] Original shape: (2, 1), unsqueezing to: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Input shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: LSTM_13 for ONNX node: LSTM_13
[11/08/2021-09:08:59] [V] [TRT] Hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] X(t) * W^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] H(t-1) * R^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] intermediate(t) -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] c(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] C(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] H(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 75 for ONNX tensor: 75
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 76 for ONNX tensor: 76
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 77 for ONNX tensor: 77
[11/08/2021-09:08:59] [V] [TRT] LSTM_13 [LSTM] outputs: [75 -> (16, 1, 2, 1)[FLOAT]], [76 -> (1, 2, 1)[FLOAT]], [77 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Squeeze_14 [Squeeze]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 75
[11/08/2021-09:08:59] [V] [TRT] Squeeze_14 [Squeeze] inputs: [75 -> (16, 1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Original shape: (16, 1, 2, 1), squeezing to: (16, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Squeeze_14 for ONNX node: Squeeze_14
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 78 for ONNX tensor: 78
[11/08/2021-09:08:59] [V] [TRT] Squeeze_14 [Squeeze] outputs: [78 -> (16, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_15 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_15 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_15 for ONNX node: Slice_15
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 99 for ONNX tensor: 99
[11/08/2021-09:08:59] [V] [TRT] Slice_15 [Slice] outputs: [99 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_16 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 24
[11/08/2021-09:08:59] [V] [TRT] Slice_16 [Slice] inputs: [24 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_16 for ONNX node: Slice_16
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 100 for ONNX tensor: 100
[11/08/2021-09:08:59] [V] [TRT] Slice_16 [Slice] outputs: [100 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: LSTM_17 [LSTM]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 78
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 167
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 168
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 169
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 99
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 100
[11/08/2021-09:08:59] [V] [TRT] LSTM_17 [LSTM] inputs: [78 -> (16, 2, 1)[FLOAT]], [167 -> (1, 4, 1)[FLOAT]], [168 -> (1, 4, 1)[FLOAT]], [169 -> (1, 8)[FLOAT]], [optional input, not set], [99 -> (1, 2, 1)[FLOAT]], [100 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 167 for ONNX node: 167
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 168 for ONNX node: 168
[11/08/2021-09:08:59] [V] [TRT] Registering layer: 169 for ONNX node: 169
[11/08/2021-09:08:59] [V] [TRT] Bias shape is: (1, 8)
[11/08/2021-09:08:59] [V] [TRT] Reshaping bias to: (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] After reduction, bias shape is: (1, 1, 4)
[11/08/2021-09:08:59] [V] [TRT] numDirectionsTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] hiddenSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] batchSizeTensor shape: (1)
[11/08/2021-09:08:59] [V] [TRT] Gate output rank (equal to initial hidden/cell state rank): (3)
[11/08/2021-09:08:59] [V] [TRT] Initial hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Initial cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Entering Loop
[11/08/2021-09:08:59] [V] [TRT] Original shape: (2, 1), unsqueezing to: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Input shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: LSTM_17 for ONNX node: LSTM_17
[11/08/2021-09:08:59] [V] [TRT] Hidden state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Cell state shape: (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] X(t) * W^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] H(t-1) * R^T -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] intermediate(t) -> (1, 2, 4)
[11/08/2021-09:08:59] [V] [TRT] c(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] C(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] H(t) -> (1, 2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 101 for ONNX tensor: 101
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 102 for ONNX tensor: 102
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 103 for ONNX tensor: 103
[11/08/2021-09:08:59] [V] [TRT] LSTM_17 [LSTM] outputs: [101 -> (16, 1, 2, 1)[FLOAT]], [102 -> (1, 2, 1)[FLOAT]], [103 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Concat_18 [Concat]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 50
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 76
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 102
[11/08/2021-09:08:59] [V] [TRT] Concat_18 [Concat] inputs: [50 -> (1, 2, 1)[FLOAT]], [76 -> (1, 2, 1)[FLOAT]], [102 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Concat_18 for ONNX node: Concat_18
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 104 for ONNX tensor: 104
[11/08/2021-09:08:59] [V] [TRT] Concat_18 [Concat] outputs: [104 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Slice_19 [Slice]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 104
[11/08/2021-09:08:59] [V] [TRT] Slice_19 [Slice] inputs: [104 -> (3, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Slice_19 for ONNX node: Slice_19
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 105 for ONNX tensor: 105
[11/08/2021-09:08:59] [V] [TRT] Slice_19 [Slice] outputs: [105 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Squeeze_20 [Squeeze]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 105
[11/08/2021-09:08:59] [V] [TRT] Squeeze_20 [Squeeze] inputs: [105 -> (1, 2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Original shape: (1, 2, 1), squeezing to: (2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Squeeze_20 for ONNX node: Squeeze_20
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 106 for ONNX tensor: 106
[11/08/2021-09:08:59] [V] [TRT] Squeeze_20 [Squeeze] outputs: [106 -> (2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Parsing node: Gemm_21 [Gemm]
[11/08/2021-09:08:59] [V] [TRT] Searching for input: 106
[11/08/2021-09:08:59] [V] [TRT] Searching for input: fc.weight
[11/08/2021-09:08:59] [V] [TRT] Searching for input: fc.bias
[11/08/2021-09:08:59] [V] [TRT] Gemm_21 [Gemm] inputs: [106 -> (2, 1)[FLOAT]], [fc.weight -> (1, 1)[FLOAT]], [fc.bias -> (1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] GEMM: using FC layer instead of MM because all criteria were met.
[11/08/2021-09:08:59] [V] [TRT] Original shape: (2, 1), unsqueezing to: (2, 1, 1, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering layer: Gemm_21 for ONNX node: Gemm_21
[11/08/2021-09:08:59] [V] [TRT] Original shape: (2, 1, 1, 1), squeezing to: (2, 1)
[11/08/2021-09:08:59] [V] [TRT] Registering tensor: 107_0 for ONNX tensor: 107
[11/08/2021-09:08:59] [V] [TRT] Gemm_21 [Gemm] outputs: [107 -> (2, 1)[FLOAT]], 
[11/08/2021-09:08:59] [V] [TRT] Marking 107_0 as output: 107
[11/08/2021-09:08:59] [I] Finish parsing network model
[11/08/2021-09:08:59] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 372, GPU 7418 (MiB)
[11/08/2021-09:08:59] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 372 MiB, GPU 7418 MiB
[11/08/2021-09:08:59] [V] [TRT] Applying generic optimizations to the graph for inference.
[11/08/2021-09:08:59] [V] [TRT] Original: 107 layers
[11/08/2021-09:08:59] [V] [TRT] After dead-layer removal: 107 layers
[11/08/2021-09:08:59] [V] [TRT] ConstShuffleFusion: Fusing (Unnamed Layer* 7) [Constant] with (Unnamed Layer* 8) [Shuffle]
[11/08/2021-09:08:59] [V] [TRT] ConstShuffleFusion: Fusing 129 with (Unnamed Layer* 17) [Shuffle]
[11/08/2021-09:08:59] [V] [TRT] ConstShuffleFusion: Fusing 149 with (Unnamed Layer* 62) [Shuffle]
[11/08/2021-09:08:59] [V] [TRT] ConstShuffleFusion: Fusing 169 with (Unnamed Layer* 107) [Shuffle]
[11/08/2021-09:08:59] [V] [TRT] ShuffleShuffleFusion: Fusing Squeeze_20 with (Unnamed Layer* 149) [Shuffle]
[11/08/2021-09:08:59] [V] [TRT] After Myelin optimization: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After scale fusion: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After vertical fusions: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After dupe layer removal: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After final dead-layer removal: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After tensor merging: 1 layers
[11/08/2021-09:08:59] [V] [TRT] After concat removal: 1 layers
[11/08/2021-09:08:59] [V] [TRT] Graph construction and optimization completed in 0.00975795 seconds.
[11/08/2021-09:08:59] [I] [TRT] ---------- Layers Running on DLA ----------
[11/08/2021-09:08:59] [I] [TRT] ---------- Layers Running on GPU ----------
[11/08/2021-09:08:59] [I] [TRT] [GpuLayer] {ForeignNode[127...(Unnamed Layer* 151) [Shuffle]]}
[11/08/2021-09:09:00] [V] [TRT] Using cublas a tactic source
[11/08/2021-09:09:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +226, GPU +235, now: CPU 598, GPU 7653 (MiB)
[11/08/2021-09:09:00] [V] [TRT] Using cuDNN as a tactic source
[11/08/2021-09:09:02] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +10, now: CPU 905, GPU 7663 (MiB)
[11/08/2021-09:09:02] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[11/08/2021-09:09:02] [V] [TRT] Constructing optimization profile number 0 [1/1].
[11/08/2021-09:09:02] [V] [TRT] *************** Autotuning format combination: Float(16,1,1) -> Float(1,1) ***************
[11/08/2021-09:09:02] [V] [TRT] --------------- Timing Runner: {ForeignNode[127...(Unnamed Layer* 151) [Shuffle]]} (Myelin)
[11/08/2021-09:09:09] [V] [TRT] Tactic: 0 is the only option, timing skipped
[11/08/2021-09:09:09] [V] [TRT] Fastest Tactic: 0 Time: 0
[11/08/2021-09:09:09] [V] [TRT] Formats and tactics selection completed in 7.60474 seconds.
[11/08/2021-09:09:09] [V] [TRT] After reformat layers: 1 layers
[11/08/2021-09:09:09] [V] [TRT] Block size 1073741824
[11/08/2021-09:09:09] [V] [TRT] Total Activation Memory: 1073741824
[11/08/2021-09:09:09] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[11/08/2021-09:09:16] [V] [TRT] Layer: {ForeignNode[127...(Unnamed Layer* 151) [Shuffle]]} HostPersistent: 32 DevicePersistent: 0
[11/08/2021-09:09:16] [I] [TRT] Total Host Persistent Memory: 32
[11/08/2021-09:09:16] [I] [TRT] Total Device Persistent Memory: 0
[11/08/2021-09:09:16] [I] [TRT] Total Scratch Memory: 2400
[11/08/2021-09:09:16] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[11/08/2021-09:09:16] [V] [TRT] Using cublas a tactic source
[11/08/2021-09:09:16] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +7, now: CPU 922, GPU 7692 (MiB)
[11/08/2021-09:09:16] [V] [TRT] Using cuDNN as a tactic source
[11/08/2021-09:09:16] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +5, now: CPU 923, GPU 7697 (MiB)
[11/08/2021-09:09:16] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 922, GPU 7696 (MiB)
[11/08/2021-09:09:16] [V] [TRT] Engine generation completed in 17.3778 seconds.
[11/08/2021-09:09:16] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 922, GPU 7694 (MiB)
[11/08/2021-09:09:16] [V] [TRT] Engine Layer Information:
Layer(Myelin): {ForeignNode[127...(Unnamed Layer* 151) [Shuffle]]}, Tactic: 0, x[Float(2,16,1)] -> 107[Float(2,1)]
[11/08/2021-09:09:16] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 922 MiB, GPU 7694 MiB
[11/08/2021-09:09:17] [I] [TRT] Loaded engine size: 34 MB
[11/08/2021-09:09:17] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 949 MiB, GPU 7663 MiB
[11/08/2021-09:09:17] [V] [TRT] Using cublas a tactic source
[11/08/2021-09:09:17] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 956, GPU 7662 (MiB)
[11/08/2021-09:09:17] [V] [TRT] Using cuDNN as a tactic source
[11/08/2021-09:09:17] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 956, GPU 7662 (MiB)
[11/08/2021-09:09:17] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 956, GPU 7662 (MiB)
[11/08/2021-09:09:17] [V] [TRT] Deserialization required 221081 microseconds.
[11/08/2021-09:09:17] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 956 MiB, GPU 7662 MiB
[11/08/2021-09:09:18] [I] Engine built in 20.5209 sec.
[11/08/2021-09:09:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 921 MiB, GPU 7660 MiB
[11/08/2021-09:09:18] [V] [TRT] Using cublas a tactic source
[11/08/2021-09:09:18] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 922, GPU 7660 (MiB)
[11/08/2021-09:09:18] [V] [TRT] Using cuDNN as a tactic source
[11/08/2021-09:09:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 922, GPU 7660 (MiB)
[11/08/2021-09:09:18] [V] [TRT] Total per-runner device memory is 0
[11/08/2021-09:09:18] [V] [TRT] Total per-runner host memory is 32
[11/08/2021-09:09:18] [V] [TRT] Allocated activation device memory of size 2560
[11/08/2021-09:09:18] [V] [TRT] myelinAllocCb allocated GPU (data-constants) 8 bytes at 0x20d39a000.
[11/08/2021-09:09:18] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 933 MiB, GPU 7660 MiB
[11/08/2021-09:09:18] [I] Created input binding for x with dimensions 2x16x1
[11/08/2021-09:09:18] [I] Created output binding for 107 with dimensions 2x1
[11/08/2021-09:09:18] [I] Starting inference
[11/08/2021-09:09:18] [V] [TRT] myelinAllocCb allocated GPU 2184 bytes at 0x20d3dcc00.
[11/08/2021-09:09:18] [V] [TRT] myelinAllocCb allocated CPU 4100 bytes at 0x7f3c002f00.
[11/08/2021-09:09:21] [I] Warmup completed 61 queries over 200 ms
[11/08/2021-09:09:21] [I] Timing trace has 3753 queries over 3.00109 s
[11/08/2021-09:09:21] [I] 
[11/08/2021-09:09:21] [I] === Trace details ===
[11/08/2021-09:09:21] [I] Trace averages of 10 runs:
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 2.18166 ms - Host latency: 2.19386 ms (end to end 2.21095 ms, enqueue 1.15731 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 2.20305 ms - Host latency: 2.21529 ms (end to end 2.23222 ms, enqueue 0.985371 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 2.18222 ms - Host latency: 2.19441 ms (end to end 2.21095 ms, enqueue 1.04016 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 2.2098 ms - Host latency: 2.22208 ms (end to end 2.23813 ms, enqueue 0.952585 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 2.20888 ms - Host latency: 2.22108 ms (end to end 2.23524 ms, enqueue 0.953934 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.64207 ms - Host latency: 1.65122 ms (end to end 1.66254 ms, enqueue 0.891766 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.52362 ms - Host latency: 1.53221 ms (end to end 1.54399 ms, enqueue 0.787271 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.52586 ms - Host latency: 1.53441 ms (end to end 1.54738 ms, enqueue 0.783243 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.52984 ms - Host latency: 1.53844 ms (end to end 1.54902 ms, enqueue 0.746066 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.53047 ms - Host latency: 1.53894 ms (end to end 1.5507 ms, enqueue 0.731345 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.53896 ms - Host latency: 1.54753 ms (end to end 1.55782 ms, enqueue 0.700711 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.53806 ms - Host latency: 1.54642 ms (end to end 1.5572 ms, enqueue 0.689001 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.54131 ms - Host latency: 1.54981 ms (end to end 1.56328 ms, enqueue 0.716205 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.54686 ms - Host latency: 1.55564 ms (end to end 1.56857 ms, enqueue 0.679144 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.54471 ms - Host latency: 1.55312 ms (end to end 1.56384 ms, enqueue 0.672046 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.53733 ms - Host latency: 1.54587 ms (end to end 1.5599 ms, enqueue 0.698676 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.5364 ms - Host latency: 1.54499 ms (end to end 1.55699 ms, enqueue 0.684363 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.54409 ms - Host latency: 1.55253 ms (end to end 1.56308 ms, enqueue 0.67955 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.46789 ms - Host latency: 1.47598 ms (end to end 1.48743 ms, enqueue 0.702734 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.18398 ms - Host latency: 1.19073 ms (end to end 1.20108 ms, enqueue 0.685303 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19368 ms - Host latency: 1.20054 ms (end to end 1.21057 ms, enqueue 0.715393 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.11283 ms - Host latency: 1.11968 ms (end to end 1.12991 ms, enqueue 0.793304 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19112 ms - Host latency: 1.198 ms (end to end 1.20806 ms, enqueue 0.684296 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19278 ms - Host latency: 1.19956 ms (end to end 1.20894 ms, enqueue 0.677771 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19197 ms - Host latency: 1.19875 ms (end to end 1.21023 ms, enqueue 0.680725 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.16359 ms - Host latency: 1.17038 ms (end to end 1.18109 ms, enqueue 0.660382 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.18466 ms - Host latency: 1.19144 ms (end to end 1.20232 ms, enqueue 0.724322 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.18542 ms - Host latency: 1.1922 ms (end to end 1.20411 ms, enqueue 0.707043 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.18724 ms - Host latency: 1.19396 ms (end to end 1.20327 ms, enqueue 0.702777 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19184 ms - Host latency: 1.19865 ms (end to end 1.2106 ms, enqueue 0.675903 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.19359 ms - Host latency: 1.20053 ms (end to end 1.21092 ms, enqueue 0.670044 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.854547 ms - Host latency: 0.860199 ms (end to end 0.8685 ms, enqueue 0.72312 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.00472 ms - Host latency: 1.01039 ms (end to end 1.01958 ms, enqueue 0.797607 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.980072 ms - Host latency: 0.985809 ms (end to end 0.995489 ms, enqueue 0.665588 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 1.03211 ms - Host latency: 1.03782 ms (end to end 1.04656 ms, enqueue 0.834467 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.972015 ms - Host latency: 0.977759 ms (end to end 0.987756 ms, enqueue 0.709277 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.948804 ms - Host latency: 0.954645 ms (end to end 0.964868 ms, enqueue 0.71402 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.983691 ms - Host latency: 0.989349 ms (end to end 0.999182 ms, enqueue 0.663879 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.982916 ms - Host latency: 0.988672 ms (end to end 0.998413 ms, enqueue 0.666138 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.985474 ms - Host latency: 0.991333 ms (end to end 1.0015 ms, enqueue 0.662543 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.984418 ms - Host latency: 0.9901 ms (end to end 1.00071 ms, enqueue 0.660455 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.97959 ms - Host latency: 0.985297 ms (end to end 0.994434 ms, enqueue 0.689697 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.981714 ms - Host latency: 0.987506 ms (end to end 0.997919 ms, enqueue 0.673865 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.984314 ms - Host latency: 0.98996 ms (end to end 0.999554 ms, enqueue 0.661255 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.983276 ms - Host latency: 0.988922 ms (end to end 0.99837 ms, enqueue 0.66795 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.984802 ms - Host latency: 0.990454 ms (end to end 0.999622 ms, enqueue 0.65592 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.942194 ms - Host latency: 0.947754 ms (end to end 0.956451 ms, enqueue 0.683136 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.833575 ms - Host latency: 0.838654 ms (end to end 0.848468 ms, enqueue 0.670062 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.860419 ms - Host latency: 0.865509 ms (end to end 0.874377 ms, enqueue 0.659259 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.859595 ms - Host latency: 0.8646 ms (end to end 0.873138 ms, enqueue 0.664795 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.859058 ms - Host latency: 0.864209 ms (end to end 0.873163 ms, enqueue 0.653448 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.858765 ms - Host latency: 0.863721 ms (end to end 0.872925 ms, enqueue 0.669006 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.888635 ms - Host latency: 0.893896 ms (end to end 0.903717 ms, enqueue 0.77547 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.857056 ms - Host latency: 0.86225 ms (end to end 0.87179 ms, enqueue 0.678192 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.85722 ms - Host latency: 0.862439 ms (end to end 0.872461 ms, enqueue 0.670245 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.856195 ms - Host latency: 0.86106 ms (end to end 0.869916 ms, enqueue 0.67738 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.860315 ms - Host latency: 0.865466 ms (end to end 0.875275 ms, enqueue 0.659229 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.860754 ms - Host latency: 0.865875 ms (end to end 0.874127 ms, enqueue 0.651123 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.929675 ms - Host latency: 0.936334 ms (end to end 0.945605 ms, enqueue 0.765967 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.817999 ms - Host latency: 0.831207 ms (end to end 0.841138 ms, enqueue 0.688007 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.742188 ms - Host latency: 0.767737 ms (end to end 0.776947 ms, enqueue 0.703772 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.735773 ms - Host latency: 0.763013 ms (end to end 0.772522 ms, enqueue 0.730298 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.708496 ms - Host latency: 0.73338 ms (end to end 0.742676 ms, enqueue 0.704132 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.77713 ms - Host latency: 0.802112 ms (end to end 0.811725 ms, enqueue 0.771313 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661121 ms - Host latency: 0.683441 ms (end to end 0.692126 ms, enqueue 0.656238 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.739722 ms - Host latency: 0.764459 ms (end to end 0.773975 ms, enqueue 0.735333 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.716565 ms - Host latency: 0.744287 ms (end to end 0.753723 ms, enqueue 0.712476 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664673 ms - Host latency: 0.687518 ms (end to end 0.69859 ms, enqueue 0.66073 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.671307 ms - Host latency: 0.697491 ms (end to end 0.708185 ms, enqueue 0.667401 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.692169 ms - Host latency: 0.714343 ms (end to end 0.725012 ms, enqueue 0.688001 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656165 ms - Host latency: 0.679065 ms (end to end 0.690454 ms, enqueue 0.652441 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.672314 ms - Host latency: 0.694507 ms (end to end 0.706396 ms, enqueue 0.668103 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.760571 ms - Host latency: 0.787927 ms (end to end 0.797302 ms, enqueue 0.756506 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663025 ms - Host latency: 0.685742 ms (end to end 0.694897 ms, enqueue 0.65896 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.668481 ms - Host latency: 0.691711 ms (end to end 0.701086 ms, enqueue 0.664697 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.80116 ms - Host latency: 0.824377 ms (end to end 0.833948 ms, enqueue 0.7927 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.683838 ms - Host latency: 0.711108 ms (end to end 0.72063 ms, enqueue 0.679517 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664636 ms - Host latency: 0.68905 ms (end to end 0.698169 ms, enqueue 0.660632 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.683728 ms - Host latency: 0.708484 ms (end to end 0.717725 ms, enqueue 0.679871 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.745618 ms - Host latency: 0.770508 ms (end to end 0.780762 ms, enqueue 0.741382 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.672693 ms - Host latency: 0.697925 ms (end to end 0.708508 ms, enqueue 0.668726 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.659497 ms - Host latency: 0.681592 ms (end to end 0.69198 ms, enqueue 0.655591 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.684106 ms - Host latency: 0.706299 ms (end to end 0.715625 ms, enqueue 0.680249 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.660754 ms - Host latency: 0.685864 ms (end to end 0.695264 ms, enqueue 0.655872 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.654053 ms - Host latency: 0.676086 ms (end to end 0.685242 ms, enqueue 0.65033 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.700623 ms - Host latency: 0.72583 ms (end to end 0.736389 ms, enqueue 0.696692 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.672913 ms - Host latency: 0.748877 ms (end to end 0.758154 ms, enqueue 0.668762 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661499 ms - Host latency: 0.683862 ms (end to end 0.69469 ms, enqueue 0.657617 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.695374 ms - Host latency: 0.719897 ms (end to end 0.729504 ms, enqueue 0.677112 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.711792 ms - Host latency: 0.736499 ms (end to end 0.746106 ms, enqueue 0.707678 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.677393 ms - Host latency: 0.703406 ms (end to end 0.71272 ms, enqueue 0.673413 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.797253 ms - Host latency: 0.819568 ms (end to end 0.830212 ms, enqueue 0.793225 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658899 ms - Host latency: 0.686523 ms (end to end 0.697217 ms, enqueue 0.654968 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.686426 ms - Host latency: 0.712634 ms (end to end 0.722119 ms, enqueue 0.682007 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.672278 ms - Host latency: 0.696631 ms (end to end 0.705847 ms, enqueue 0.666882 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.68385 ms - Host latency: 0.711768 ms (end to end 0.721094 ms, enqueue 0.680066 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663135 ms - Host latency: 0.690137 ms (end to end 0.699121 ms, enqueue 0.659216 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.662646 ms - Host latency: 0.685889 ms (end to end 0.695032 ms, enqueue 0.658325 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.709717 ms - Host latency: 0.732483 ms (end to end 0.744983 ms, enqueue 0.705518 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664185 ms - Host latency: 0.686401 ms (end to end 0.696875 ms, enqueue 0.660315 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.675085 ms - Host latency: 0.699048 ms (end to end 0.708618 ms, enqueue 0.670947 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.68645 ms - Host latency: 0.709436 ms (end to end 0.721094 ms, enqueue 0.682166 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.66311 ms - Host latency: 0.688574 ms (end to end 0.699499 ms, enqueue 0.659082 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664197 ms - Host latency: 0.687146 ms (end to end 0.69646 ms, enqueue 0.660352 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.692554 ms - Host latency: 0.719287 ms (end to end 0.729968 ms, enqueue 0.688684 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.660645 ms - Host latency: 0.683716 ms (end to end 0.692712 ms, enqueue 0.656628 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.667029 ms - Host latency: 0.69209 ms (end to end 0.701538 ms, enqueue 0.661707 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.864001 ms - Host latency: 0.894177 ms (end to end 0.904199 ms, enqueue 0.857092 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658813 ms - Host latency: 0.685181 ms (end to end 0.69436 ms, enqueue 0.654797 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664075 ms - Host latency: 0.692175 ms (end to end 0.701624 ms, enqueue 0.657397 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657397 ms - Host latency: 0.680322 ms (end to end 0.69093 ms, enqueue 0.653516 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.667969 ms - Host latency: 0.693164 ms (end to end 0.70249 ms, enqueue 0.66416 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.724963 ms - Host latency: 0.745874 ms (end to end 0.755334 ms, enqueue 0.71499 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.73877 ms - Host latency: 0.743384 ms (end to end 0.750806 ms, enqueue 0.673035 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.760803 ms - Host latency: 0.765466 ms (end to end 0.77273 ms, enqueue 0.651367 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.698218 ms - Host latency: 0.713794 ms (end to end 0.722913 ms, enqueue 0.737341 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.653186 ms - Host latency: 0.677124 ms (end to end 0.687366 ms, enqueue 0.649011 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657666 ms - Host latency: 0.68656 ms (end to end 0.695947 ms, enqueue 0.65354 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.666602 ms - Host latency: 0.711218 ms (end to end 0.721875 ms, enqueue 0.662512 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.693506 ms - Host latency: 0.719678 ms (end to end 0.729358 ms, enqueue 0.688708 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658313 ms - Host latency: 0.684741 ms (end to end 0.694055 ms, enqueue 0.654504 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.669153 ms - Host latency: 0.695691 ms (end to end 0.706751 ms, enqueue 0.663843 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.807104 ms - Host latency: 0.829712 ms (end to end 0.840076 ms, enqueue 0.80282 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.66012 ms - Host latency: 0.681812 ms (end to end 0.691016 ms, enqueue 0.65625 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.672815 ms - Host latency: 0.697815 ms (end to end 0.706689 ms, enqueue 0.66875 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.67511 ms - Host latency: 0.697498 ms (end to end 0.711328 ms, enqueue 0.669824 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.778516 ms - Host latency: 0.806055 ms (end to end 0.81571 ms, enqueue 0.772583 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.666187 ms - Host latency: 0.691003 ms (end to end 0.700232 ms, enqueue 0.662085 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.7073 ms - Host latency: 0.730469 ms (end to end 0.741907 ms, enqueue 0.703064 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.674438 ms - Host latency: 0.699341 ms (end to end 0.710974 ms, enqueue 0.669055 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657336 ms - Host latency: 0.680676 ms (end to end 0.689905 ms, enqueue 0.653467 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663965 ms - Host latency: 0.686682 ms (end to end 0.695789 ms, enqueue 0.659998 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656787 ms - Host latency: 0.678784 ms (end to end 0.688001 ms, enqueue 0.652954 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.654529 ms - Host latency: 0.685474 ms (end to end 0.695044 ms, enqueue 0.648022 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.651526 ms - Host latency: 0.674622 ms (end to end 0.684863 ms, enqueue 0.647595 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661072 ms - Host latency: 0.689453 ms (end to end 0.698682 ms, enqueue 0.657043 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.708667 ms - Host latency: 0.73175 ms (end to end 0.743066 ms, enqueue 0.704517 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.662817 ms - Host latency: 0.68844 ms (end to end 0.699121 ms, enqueue 0.658789 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657581 ms - Host latency: 0.681262 ms (end to end 0.690405 ms, enqueue 0.653674 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.662219 ms - Host latency: 0.688647 ms (end to end 0.697717 ms, enqueue 0.658264 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.659631 ms - Host latency: 0.683887 ms (end to end 0.694458 ms, enqueue 0.655798 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.678381 ms - Host latency: 0.700745 ms (end to end 0.710315 ms, enqueue 0.672668 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.668091 ms - Host latency: 0.693054 ms (end to end 0.702441 ms, enqueue 0.66405 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661011 ms - Host latency: 0.684717 ms (end to end 0.695581 ms, enqueue 0.657117 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.65249 ms - Host latency: 0.675842 ms (end to end 0.686548 ms, enqueue 0.648743 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658679 ms - Host latency: 0.682141 ms (end to end 0.69259 ms, enqueue 0.654651 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655127 ms - Host latency: 0.676733 ms (end to end 0.687024 ms, enqueue 0.651306 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655334 ms - Host latency: 0.68075 ms (end to end 0.689819 ms, enqueue 0.650378 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.688135 ms - Host latency: 0.71803 ms (end to end 0.727356 ms, enqueue 0.684131 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.696802 ms - Host latency: 0.721265 ms (end to end 0.730896 ms, enqueue 0.691357 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.8255 ms - Host latency: 0.854419 ms (end to end 0.86532 ms, enqueue 0.817444 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.722644 ms - Host latency: 0.749341 ms (end to end 0.758948 ms, enqueue 0.718335 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.749414 ms - Host latency: 0.776648 ms (end to end 0.78645 ms, enqueue 0.74519 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.679089 ms - Host latency: 0.704065 ms (end to end 0.716052 ms, enqueue 0.674878 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.734363 ms - Host latency: 0.763525 ms (end to end 0.773267 ms, enqueue 0.729663 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.681323 ms - Host latency: 0.706812 ms (end to end 0.716406 ms, enqueue 0.677319 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.691956 ms - Host latency: 0.72063 ms (end to end 0.744287 ms, enqueue 0.687903 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655078 ms - Host latency: 0.677991 ms (end to end 0.688818 ms, enqueue 0.651233 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663635 ms - Host latency: 0.690649 ms (end to end 0.700024 ms, enqueue 0.659778 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650708 ms - Host latency: 0.674146 ms (end to end 0.683264 ms, enqueue 0.646826 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655151 ms - Host latency: 0.680371 ms (end to end 0.689612 ms, enqueue 0.651392 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.649439 ms - Host latency: 0.671155 ms (end to end 0.681775 ms, enqueue 0.644214 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.660913 ms - Host latency: 0.684998 ms (end to end 0.694043 ms, enqueue 0.656958 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.682776 ms - Host latency: 0.70636 ms (end to end 0.715747 ms, enqueue 0.678601 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.740503 ms - Host latency: 0.774353 ms (end to end 0.785608 ms, enqueue 0.73457 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.773157 ms - Host latency: 0.798669 ms (end to end 0.810071 ms, enqueue 0.763892 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663 ms - Host latency: 0.68999 ms (end to end 0.699426 ms, enqueue 0.658801 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658728 ms - Host latency: 0.681116 ms (end to end 0.692102 ms, enqueue 0.654822 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.659265 ms - Host latency: 0.681958 ms (end to end 0.690918 ms, enqueue 0.655298 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.675671 ms - Host latency: 0.701892 ms (end to end 0.714478 ms, enqueue 0.671301 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.664587 ms - Host latency: 0.692041 ms (end to end 0.702856 ms, enqueue 0.660547 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.654089 ms - Host latency: 0.678235 ms (end to end 0.68728 ms, enqueue 0.649756 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.653809 ms - Host latency: 0.67821 ms (end to end 0.687268 ms, enqueue 0.649854 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64978 ms - Host latency: 0.673547 ms (end to end 0.682556 ms, enqueue 0.644226 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650427 ms - Host latency: 0.677441 ms (end to end 0.686475 ms, enqueue 0.645923 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644958 ms - Host latency: 0.667126 ms (end to end 0.676013 ms, enqueue 0.641089 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655347 ms - Host latency: 0.676831 ms (end to end 0.68573 ms, enqueue 0.651209 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.668506 ms - Host latency: 0.690479 ms (end to end 0.700769 ms, enqueue 0.664526 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.652051 ms - Host latency: 0.677649 ms (end to end 0.686694 ms, enqueue 0.647913 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643127 ms - Host latency: 0.66582 ms (end to end 0.674841 ms, enqueue 0.639099 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648584 ms - Host latency: 0.673572 ms (end to end 0.682532 ms, enqueue 0.644727 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.653906 ms - Host latency: 0.677307 ms (end to end 0.686182 ms, enqueue 0.648791 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645837 ms - Host latency: 0.667664 ms (end to end 0.676465 ms, enqueue 0.642017 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.649048 ms - Host latency: 0.675928 ms (end to end 0.685986 ms, enqueue 0.645068 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648279 ms - Host latency: 0.672009 ms (end to end 0.681006 ms, enqueue 0.644495 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.698242 ms - Host latency: 0.72196 ms (end to end 0.733228 ms, enqueue 0.693835 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650964 ms - Host latency: 0.674841 ms (end to end 0.685815 ms, enqueue 0.646875 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645605 ms - Host latency: 0.668188 ms (end to end 0.677246 ms, enqueue 0.641858 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650574 ms - Host latency: 0.673706 ms (end to end 0.682654 ms, enqueue 0.646558 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656592 ms - Host latency: 0.679492 ms (end to end 0.689441 ms, enqueue 0.652502 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655469 ms - Host latency: 0.676782 ms (end to end 0.686963 ms, enqueue 0.65127 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.659021 ms - Host latency: 0.680395 ms (end to end 0.691089 ms, enqueue 0.655042 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.66355 ms - Host latency: 0.721582 ms (end to end 0.731323 ms, enqueue 0.657275 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650439 ms - Host latency: 0.672522 ms (end to end 0.681689 ms, enqueue 0.646362 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.653137 ms - Host latency: 0.693152 ms (end to end 0.702283 ms, enqueue 0.649023 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648486 ms - Host latency: 0.674341 ms (end to end 0.683374 ms, enqueue 0.644519 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645886 ms - Host latency: 0.666907 ms (end to end 0.677295 ms, enqueue 0.642004 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650964 ms - Host latency: 0.676038 ms (end to end 0.684985 ms, enqueue 0.646997 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646948 ms - Host latency: 0.669214 ms (end to end 0.678333 ms, enqueue 0.643091 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.685571 ms - Host latency: 0.711316 ms (end to end 0.720654 ms, enqueue 0.681702 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646655 ms - Host latency: 0.668506 ms (end to end 0.680713 ms, enqueue 0.64281 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.639526 ms - Host latency: 0.666589 ms (end to end 0.675537 ms, enqueue 0.635584 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663672 ms - Host latency: 0.687451 ms (end to end 0.69668 ms, enqueue 0.659668 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.637439 ms - Host latency: 0.658826 ms (end to end 0.667737 ms, enqueue 0.633337 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647327 ms - Host latency: 0.669519 ms (end to end 0.678686 ms, enqueue 0.642212 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.63783 ms - Host latency: 0.658691 ms (end to end 0.669128 ms, enqueue 0.633911 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.649097 ms - Host latency: 0.673682 ms (end to end 0.683264 ms, enqueue 0.645068 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.670068 ms - Host latency: 0.692187 ms (end to end 0.70144 ms, enqueue 0.665979 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646692 ms - Host latency: 0.670959 ms (end to end 0.681091 ms, enqueue 0.642603 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.652649 ms - Host latency: 0.675281 ms (end to end 0.684387 ms, enqueue 0.648791 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646814 ms - Host latency: 0.669592 ms (end to end 0.67876 ms, enqueue 0.642712 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.654272 ms - Host latency: 0.681238 ms (end to end 0.691663 ms, enqueue 0.650232 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.708057 ms - Host latency: 0.731885 ms (end to end 0.741186 ms, enqueue 0.702808 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.716431 ms - Host latency: 0.741162 ms (end to end 0.750293 ms, enqueue 0.712378 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.767114 ms - Host latency: 0.792529 ms (end to end 0.801953 ms, enqueue 0.762842 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.707129 ms - Host latency: 0.729346 ms (end to end 0.741431 ms, enqueue 0.703149 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.722559 ms - Host latency: 0.747607 ms (end to end 0.759058 ms, enqueue 0.718237 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656445 ms - Host latency: 0.681567 ms (end to end 0.690552 ms, enqueue 0.652417 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641846 ms - Host latency: 0.664575 ms (end to end 0.675269 ms, enqueue 0.637671 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.660547 ms - Host latency: 0.68418 ms (end to end 0.693213 ms, enqueue 0.656714 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648047 ms - Host latency: 0.675024 ms (end to end 0.684253 ms, enqueue 0.643042 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.706689 ms - Host latency: 0.730762 ms (end to end 0.740161 ms, enqueue 0.70188 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.694043 ms - Host latency: 0.718457 ms (end to end 0.727954 ms, enqueue 0.690112 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.708105 ms - Host latency: 0.725903 ms (end to end 0.734204 ms, enqueue 0.663428 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.659204 ms - Host latency: 0.728467 ms (end to end 0.737622 ms, enqueue 0.658398 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647461 ms - Host latency: 0.67063 ms (end to end 0.679883 ms, enqueue 0.64353 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.673242 ms - Host latency: 0.705591 ms (end to end 0.713867 ms, enqueue 0.662573 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663355 ms - Host latency: 0.685693 ms (end to end 0.696362 ms, enqueue 0.659229 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.669849 ms - Host latency: 0.696167 ms (end to end 0.707397 ms, enqueue 0.665625 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655005 ms - Host latency: 0.679907 ms (end to end 0.689087 ms, enqueue 0.650806 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.652612 ms - Host latency: 0.674585 ms (end to end 0.688647 ms, enqueue 0.648486 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.639575 ms - Host latency: 0.662915 ms (end to end 0.671728 ms, enqueue 0.635645 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645508 ms - Host latency: 0.667603 ms (end to end 0.678149 ms, enqueue 0.641235 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638965 ms - Host latency: 0.661108 ms (end to end 0.671851 ms, enqueue 0.634961 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656152 ms - Host latency: 0.680664 ms (end to end 0.691577 ms, enqueue 0.6521 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.703564 ms - Host latency: 0.726538 ms (end to end 0.735693 ms, enqueue 0.699414 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657788 ms - Host latency: 0.67959 ms (end to end 0.690625 ms, enqueue 0.653613 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.654883 ms - Host latency: 0.677148 ms (end to end 0.687866 ms, enqueue 0.650708 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643262 ms - Host latency: 0.665894 ms (end to end 0.674878 ms, enqueue 0.63938 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645435 ms - Host latency: 0.669702 ms (end to end 0.678784 ms, enqueue 0.639722 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643237 ms - Host latency: 0.665576 ms (end to end 0.675879 ms, enqueue 0.639258 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644189 ms - Host latency: 0.666016 ms (end to end 0.674951 ms, enqueue 0.640332 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645605 ms - Host latency: 0.667529 ms (end to end 0.676392 ms, enqueue 0.641748 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.714014 ms - Host latency: 0.737158 ms (end to end 0.746655 ms, enqueue 0.709009 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650732 ms - Host latency: 0.674463 ms (end to end 0.683472 ms, enqueue 0.646582 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.636084 ms - Host latency: 0.6604 ms (end to end 0.669434 ms, enqueue 0.632153 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.655469 ms - Host latency: 0.67771 ms (end to end 0.687036 ms, enqueue 0.64917 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641113 ms - Host latency: 0.664258 ms (end to end 0.674512 ms, enqueue 0.637085 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.748657 ms - Host latency: 0.771021 ms (end to end 0.780103 ms, enqueue 0.743799 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640015 ms - Host latency: 0.668921 ms (end to end 0.678101 ms, enqueue 0.63606 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.651489 ms - Host latency: 0.675244 ms (end to end 0.684326 ms, enqueue 0.647339 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641235 ms - Host latency: 0.66604 ms (end to end 0.675098 ms, enqueue 0.635669 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.633325 ms - Host latency: 0.658887 ms (end to end 0.670117 ms, enqueue 0.629419 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.636133 ms - Host latency: 0.657129 ms (end to end 0.667749 ms, enqueue 0.6323 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.630029 ms - Host latency: 0.651685 ms (end to end 0.660889 ms, enqueue 0.626318 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.636743 ms - Host latency: 0.659863 ms (end to end 0.670703 ms, enqueue 0.633008 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.629614 ms - Host latency: 0.651636 ms (end to end 0.660645 ms, enqueue 0.625659 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.677002 ms - Host latency: 0.701172 ms (end to end 0.7104 ms, enqueue 0.672949 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.635791 ms - Host latency: 0.657031 ms (end to end 0.665869 ms, enqueue 0.631396 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.635938 ms - Host latency: 0.659937 ms (end to end 0.671948 ms, enqueue 0.631811 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.632935 ms - Host latency: 0.6573 ms (end to end 0.666504 ms, enqueue 0.629224 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.636011 ms - Host latency: 0.657056 ms (end to end 0.665918 ms, enqueue 0.632153 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645508 ms - Host latency: 0.669995 ms (end to end 0.679004 ms, enqueue 0.641357 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647266 ms - Host latency: 0.669702 ms (end to end 0.67937 ms, enqueue 0.639575 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645361 ms - Host latency: 0.667334 ms (end to end 0.676416 ms, enqueue 0.641431 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.711304 ms - Host latency: 0.738379 ms (end to end 0.747778 ms, enqueue 0.707007 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642334 ms - Host latency: 0.665064 ms (end to end 0.674048 ms, enqueue 0.638281 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.657593 ms - Host latency: 0.681665 ms (end to end 0.692334 ms, enqueue 0.653613 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.639966 ms - Host latency: 0.665064 ms (end to end 0.673999 ms, enqueue 0.635864 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643115 ms - Host latency: 0.667725 ms (end to end 0.676733 ms, enqueue 0.638574 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640356 ms - Host latency: 0.661914 ms (end to end 0.670923 ms, enqueue 0.636426 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644312 ms - Host latency: 0.666895 ms (end to end 0.676611 ms, enqueue 0.638159 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.65249 ms - Host latency: 0.705249 ms (end to end 0.717847 ms, enqueue 0.648047 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658472 ms - Host latency: 0.679639 ms (end to end 0.68916 ms, enqueue 0.652051 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645142 ms - Host latency: 0.684131 ms (end to end 0.693726 ms, enqueue 0.641162 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641577 ms - Host latency: 0.663574 ms (end to end 0.674414 ms, enqueue 0.637549 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640723 ms - Host latency: 0.662427 ms (end to end 0.672656 ms, enqueue 0.636914 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.726514 ms - Host latency: 0.754028 ms (end to end 0.763843 ms, enqueue 0.722046 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.663013 ms - Host latency: 0.684644 ms (end to end 0.694116 ms, enqueue 0.657788 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.671265 ms - Host latency: 0.69646 ms (end to end 0.706152 ms, enqueue 0.665356 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.724805 ms - Host latency: 0.771118 ms (end to end 0.781274 ms, enqueue 0.706592 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.611938 ms - Host latency: 0.735327 ms (end to end 0.744409 ms, enqueue 0.65293 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646289 ms - Host latency: 0.669092 ms (end to end 0.678418 ms, enqueue 0.642212 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.673389 ms - Host latency: 0.708984 ms (end to end 0.718042 ms, enqueue 0.66023 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.650391 ms - Host latency: 0.674585 ms (end to end 0.684204 ms, enqueue 0.645239 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644605 ms - Host latency: 0.685352 ms (end to end 0.6948 ms, enqueue 0.640674 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.699487 ms - Host latency: 0.722656 ms (end to end 0.732251 ms, enqueue 0.694873 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647583 ms - Host latency: 0.67251 ms (end to end 0.68457 ms, enqueue 0.643457 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646045 ms - Host latency: 0.669971 ms (end to end 0.679468 ms, enqueue 0.642383 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.632861 ms - Host latency: 0.654346 ms (end to end 0.663599 ms, enqueue 0.629102 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.636865 ms - Host latency: 0.661279 ms (end to end 0.670776 ms, enqueue 0.633008 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638257 ms - Host latency: 0.664697 ms (end to end 0.674805 ms, enqueue 0.632324 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.750513 ms - Host latency: 0.779077 ms (end to end 0.788745 ms, enqueue 0.746216 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648511 ms - Host latency: 0.671362 ms (end to end 0.680615 ms, enqueue 0.644287 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.762183 ms - Host latency: 0.784912 ms (end to end 0.794434 ms, enqueue 0.757837 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661035 ms - Host latency: 0.683594 ms (end to end 0.695825 ms, enqueue 0.656665 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648096 ms - Host latency: 0.672339 ms (end to end 0.681494 ms, enqueue 0.644043 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642261 ms - Host latency: 0.667041 ms (end to end 0.676221 ms, enqueue 0.638477 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640356 ms - Host latency: 0.662817 ms (end to end 0.671948 ms, enqueue 0.633862 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.678149 ms - Host latency: 0.699854 ms (end to end 0.709082 ms, enqueue 0.673926 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.637622 ms - Host latency: 0.658545 ms (end to end 0.66748 ms, enqueue 0.632837 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.705005 ms - Host latency: 0.73186 ms (end to end 0.741016 ms, enqueue 0.700952 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.633936 ms - Host latency: 0.655688 ms (end to end 0.667871 ms, enqueue 0.630054 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638501 ms - Host latency: 0.668066 ms (end to end 0.677075 ms, enqueue 0.634741 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645654 ms - Host latency: 0.666821 ms (end to end 0.675855 ms, enqueue 0.641846 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638257 ms - Host latency: 0.659399 ms (end to end 0.668311 ms, enqueue 0.634497 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.770215 ms - Host latency: 0.79856 ms (end to end 0.811084 ms, enqueue 0.763281 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647485 ms - Host latency: 0.669995 ms (end to end 0.679224 ms, enqueue 0.643384 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.707446 ms - Host latency: 0.730664 ms (end to end 0.739941 ms, enqueue 0.703345 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644141 ms - Host latency: 0.66582 ms (end to end 0.677148 ms, enqueue 0.640308 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642358 ms - Host latency: 0.669141 ms (end to end 0.680347 ms, enqueue 0.638306 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.633252 ms - Host latency: 0.654028 ms (end to end 0.662964 ms, enqueue 0.629297 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645337 ms - Host latency: 0.670825 ms (end to end 0.679883 ms, enqueue 0.641504 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638892 ms - Host latency: 0.659766 ms (end to end 0.668726 ms, enqueue 0.63501 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645386 ms - Host latency: 0.666089 ms (end to end 0.675049 ms, enqueue 0.63894 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.701855 ms - Host latency: 0.723462 ms (end to end 0.732544 ms, enqueue 0.697852 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64895 ms - Host latency: 0.670288 ms (end to end 0.679492 ms, enqueue 0.64519 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642041 ms - Host latency: 0.665112 ms (end to end 0.674268 ms, enqueue 0.638159 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640088 ms - Host latency: 0.664136 ms (end to end 0.674927 ms, enqueue 0.636353 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638135 ms - Host latency: 0.658984 ms (end to end 0.668262 ms, enqueue 0.634131 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.656177 ms - Host latency: 0.681128 ms (end to end 0.690283 ms, enqueue 0.652344 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642627 ms - Host latency: 0.66377 ms (end to end 0.672632 ms, enqueue 0.638745 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.67417 ms - Host latency: 0.69873 ms (end to end 0.707812 ms, enqueue 0.667993 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.66626 ms - Host latency: 0.689062 ms (end to end 0.699194 ms, enqueue 0.661133 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643042 ms - Host latency: 0.665698 ms (end to end 0.674976 ms, enqueue 0.639014 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.652808 ms - Host latency: 0.67439 ms (end to end 0.68335 ms, enqueue 0.648877 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647339 ms - Host latency: 0.670044 ms (end to end 0.681836 ms, enqueue 0.642261 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.645801 ms - Host latency: 0.669263 ms (end to end 0.678247 ms, enqueue 0.641992 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.666553 ms - Host latency: 0.688721 ms (end to end 0.697778 ms, enqueue 0.662671 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640869 ms - Host latency: 0.661816 ms (end to end 0.67085 ms, enqueue 0.637085 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.707251 ms - Host latency: 0.729272 ms (end to end 0.740234 ms, enqueue 0.700586 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64751 ms - Host latency: 0.668555 ms (end to end 0.677588 ms, enqueue 0.64353 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641162 ms - Host latency: 0.664819 ms (end to end 0.673853 ms, enqueue 0.637354 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.643311 ms - Host latency: 0.666284 ms (end to end 0.675415 ms, enqueue 0.639233 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.63667 ms - Host latency: 0.657617 ms (end to end 0.666675 ms, enqueue 0.631128 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.680762 ms - Host latency: 0.701343 ms (end to end 0.710425 ms, enqueue 0.676904 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647534 ms - Host latency: 0.668213 ms (end to end 0.677051 ms, enqueue 0.643286 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.671143 ms - Host latency: 0.692822 ms (end to end 0.701978 ms, enqueue 0.667065 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64397 ms - Host latency: 0.666162 ms (end to end 0.679224 ms, enqueue 0.640015 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.652734 ms - Host latency: 0.673975 ms (end to end 0.682861 ms, enqueue 0.648779 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642847 ms - Host latency: 0.665186 ms (end to end 0.674341 ms, enqueue 0.639136 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.637866 ms - Host latency: 0.660791 ms (end to end 0.669751 ms, enqueue 0.63396 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.637793 ms - Host latency: 0.658838 ms (end to end 0.667773 ms, enqueue 0.632275 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646313 ms - Host latency: 0.66875 ms (end to end 0.677734 ms, enqueue 0.642407 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.680518 ms - Host latency: 0.702246 ms (end to end 0.711597 ms, enqueue 0.676709 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.720605 ms - Host latency: 0.745874 ms (end to end 0.756982 ms, enqueue 0.715674 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.639209 ms - Host latency: 0.667578 ms (end to end 0.680103 ms, enqueue 0.632593 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.634668 ms - Host latency: 0.656128 ms (end to end 0.664941 ms, enqueue 0.630859 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640649 ms - Host latency: 0.662671 ms (end to end 0.671582 ms, enqueue 0.636914 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64209 ms - Host latency: 0.670776 ms (end to end 0.679785 ms, enqueue 0.638403 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.719458 ms - Host latency: 0.74729 ms (end to end 0.760645 ms, enqueue 0.713867 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.722095 ms - Host latency: 0.747583 ms (end to end 0.758032 ms, enqueue 0.717383 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644336 ms - Host latency: 0.667017 ms (end to end 0.676733 ms, enqueue 0.640356 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.638892 ms - Host latency: 0.663721 ms (end to end 0.672803 ms, enqueue 0.634912 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.641455 ms - Host latency: 0.665137 ms (end to end 0.674414 ms, enqueue 0.635425 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.633032 ms - Host latency: 0.654175 ms (end to end 0.663135 ms, enqueue 0.629248 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646655 ms - Host latency: 0.668066 ms (end to end 0.67915 ms, enqueue 0.642725 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.660913 ms - Host latency: 0.682178 ms (end to end 0.692407 ms, enqueue 0.65708 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.646753 ms - Host latency: 0.667749 ms (end to end 0.67876 ms, enqueue 0.642773 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.649878 ms - Host latency: 0.672632 ms (end to end 0.681689 ms, enqueue 0.645825 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642871 ms - Host latency: 0.664307 ms (end to end 0.67312 ms, enqueue 0.637964 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.644629 ms - Host latency: 0.665967 ms (end to end 0.674805 ms, enqueue 0.640845 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640186 ms - Host latency: 0.661133 ms (end to end 0.670166 ms, enqueue 0.633984 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.642407 ms - Host latency: 0.664355 ms (end to end 0.673267 ms, enqueue 0.638354 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647021 ms - Host latency: 0.668115 ms (end to end 0.679053 ms, enqueue 0.642871 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.640747 ms - Host latency: 0.664233 ms (end to end 0.673169 ms, enqueue 0.636865 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.675537 ms - Host latency: 0.696973 ms (end to end 0.70791 ms, enqueue 0.671362 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.651416 ms - Host latency: 0.678857 ms (end to end 0.687939 ms, enqueue 0.647412 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.64751 ms - Host latency: 0.670386 ms (end to end 0.68042 ms, enqueue 0.643604 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.648608 ms - Host latency: 0.670117 ms (end to end 0.679248 ms, enqueue 0.6448 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.661182 ms - Host latency: 0.6823 ms (end to end 0.691919 ms, enqueue 0.654639 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.647119 ms - Host latency: 0.670483 ms (end to end 0.679468 ms, enqueue 0.643408 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.658276 ms - Host latency: 0.678784 ms (end to end 0.687769 ms, enqueue 0.654321 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.639136 ms - Host latency: 0.660693 ms (end to end 0.6698 ms, enqueue 0.6354 ms)
[11/08/2021-09:09:21] [I] Average on 10 runs - GPU latency: 0.708545 ms - Host latency: 0.735107 ms (end to end 0.747144 ms, enqueue 0.70437 ms)
[11/08/2021-09:09:21] [I] 
[11/08/2021-09:09:21] [I] === Performance summary ===
[11/08/2021-09:09:21] [I] Throughput: 1250.54 qps
[11/08/2021-09:09:21] [I] Latency: min = 0.476074 ms, max = 2.23544 ms, mean = 0.779432 ms, median = 0.679932 ms, percentile(99%) = 2.20663 ms
[11/08/2021-09:09:21] [I] End-to-End Host Latency: min = 0.483643 ms, max = 2.2529 ms, mean = 0.789399 ms, median = 0.690063 ms, percentile(99%) = 2.2226 ms
[11/08/2021-09:09:21] [I] Enqueue Time: min = 0.603027 ms, max = 2.06531 ms, mean = 0.673689 ms, median = 0.649902 ms, percentile(99%) = 1.13696 ms
[11/08/2021-09:09:21] [I] H2D Latency: min = 0.00170898 ms, max = 0.580811 ms, mean = 0.0104649 ms, median = 0.00976562 ms, percentile(99%) = 0.0350342 ms
[11/08/2021-09:09:21] [I] GPU Compute Time: min = 0.472168 ms, max = 2.22324 ms, mean = 0.757412 ms, median = 0.655884 ms, percentile(99%) = 2.19443 ms
[11/08/2021-09:09:21] [I] D2H Latency: min = 0.00146484 ms, max = 0.510132 ms, mean = 0.0115555 ms, median = 0.0114746 ms, percentile(99%) = 0.0366211 ms
[11/08/2021-09:09:21] [I] Total Host Walltime: 3.00109 s
[11/08/2021-09:09:21] [I] Total GPU Compute Time: 2.84257 s
[11/08/2021-09:09:21] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[11/08/2021-09:09:21] [W]   If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[11/08/2021-09:09:21] [I] Explanations of the performance metrics are printed in the verbose logs.
[11/08/2021-09:09:21] [V] 
[11/08/2021-09:09:21] [V] === Explanations of the performance metrics ===
[11/08/2021-09:09:21] [V] Total Host Walltime: the host walltime from when the first query (after warmups) is enqueued to when the last query is completed.
[11/08/2021-09:09:21] [V] GPU Compute Time: the GPU latency to execute the kernels for a query.
[11/08/2021-09:09:21] [V] Total GPU Compute Time: the summation of the GPU Compute Time of all the queries. If this is significantly shorter than Total Host Walltime, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2021-09:09:21] [V] Throughput: the observed throughput computed by dividing the number of queries by the Total Host Walltime. If this is significantly lower than the reciprocal of GPU Compute Time, the GPU may be under-utilized because of host-side overheads or data transfers.
[11/08/2021-09:09:21] [V] Enqueue Time: the host latency to enqueue a query. If this is longer than GPU Compute Time, the GPU may be under-utilized.
[11/08/2021-09:09:21] [V] H2D Latency: the latency for host-to-device data transfers for input tensors of a single query.
[11/08/2021-09:09:21] [V] D2H Latency: the latency for device-to-host data transfers for output tensors of a single query.
[11/08/2021-09:09:21] [V] Latency: the summation of H2D Latency, GPU Compute Time, and D2H Latency. This is the latency to infer a single query.
[11/08/2021-09:09:21] [V] End-to-End Host Latency: the duration from when the H2D of a query is called to when the D2H of the same query is completed, which includes the latency to wait for the completion of the previous query. This is the latency of a query if multiple queries are enqueued consecutively.
[11/08/2021-09:09:21] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --saveEngine=model.trt --workspace=1024 --verbose
[11/08/2021-09:09:21] [V] [TRT] myelinFreeCb freeing GPU at 0x20d3dcc00.
[11/08/2021-09:09:21] [V] [TRT] myelinFreeCb freeing CPU at 0x7f3c002f00.
[11/08/2021-09:09:21] [V] [TRT] myelinFreeCb freeing GPU at 0x20d39a000.
[11/08/2021-09:09:21] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 922, GPU 7660 (MiB)

Hi,

Have you tried running using onnx-runtime, are you getting correct results.
Thank you.