LPRNet can’t use exported engine file

s.levsov · January 4, 2022, 12:23pm

Please provide the following information when requesting support.

• Hardware RTX 3060
• Network Type LPRNet
• TAO toolkit_version: 3.21.11
• Training spec file
my_spec.txt (2.5 KB)

To continue LPRNet can't use exported engine file.

I managed to convert .etlt lprnet to .engine, yet I still get None object from trt_runtime.deserialize_cuda_engine(engine_data). trtexec reads model just fine.

[01/04/2022-14:20:27] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[01/04/2022-14:20:27] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

Output from tao-converter:

$ ./tao-converter custom_lprnet.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96  -t fp16 -e custom_lprnet.engine
[INFO] [MemUsageChange] Init CUDA: CPU +534, GPU +0, now: CPU 540, GPU 1071 (MiB)
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/filePPfaWK
[INFO] ONNX IR version:  0.0.7
[INFO] Opset version:    13
[INFO] Producer name:    keras2onnx
[INFO] Producer version: 1.8.1
[INFO] Domain:           onnxmltools
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[WARNING] ShapedWeights.cpp:173: Weights td_dense/kernel:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[WARNING] Tensor DataType is determined at build time for tensors not marked as input or output.
[INFO] Detected input dimensions from the model: (-1, 3, 48, 96)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 3, 48, 96) for input: image_input
[INFO] Using optimization profile opt shape: (4, 3, 48, 96) for input: image_input
[INFO] Using optimization profile max shape: (16, 3, 48, 96) for input: image_input
[INFO] [MemUsageSnapshot] Builder begin: CPU 595 MiB, GPU 1071 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +816, GPU +352, now: CPU 1411, GPU 1423 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1537, GPU 1481 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 57232
[INFO] Total Device Persistent Memory: 12296192
[INFO] Total Scratch Memory: 23543888
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 41 MiB, GPU 4 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2706, GPU 2027 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 2707, GPU 2035 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2706, GPU 2019 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2706, GPU 2001 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 2706 MiB, GPU 2001 MiB

Output from trtexec:

$ ./trtexec --loadEngine=/mnt/dev/deepstream_lpr_app/models/LP/LPR/custom_lprnet.engine 
&&&& RUNNING TensorRT.trtexec [TensorRT v8003] # ./trtexec --loadEngine=/mnt/dev/deepstream_lpr_app/models/LP/LPR/custom_lprnet.engine
[01/04/2022-13:55:48] [I] === Model Options ===
[01/04/2022-13:55:48] [I] Format: *
[01/04/2022-13:55:48] [I] Model: 
[01/04/2022-13:55:48] [I] Output:
[01/04/2022-13:55:48] [I] === Build Options ===
[01/04/2022-13:55:48] [I] Max batch: 1
[01/04/2022-13:55:48] [I] Workspace: 16 MiB
[01/04/2022-13:55:48] [I] minTiming: 1
[01/04/2022-13:55:48] [I] avgTiming: 8
[01/04/2022-13:55:48] [I] Precision: FP32
[01/04/2022-13:55:48] [I] Calibration: 
[01/04/2022-13:55:48] [I] Refit: Disabled
[01/04/2022-13:55:48] [I] Sparsity: Disabled
[01/04/2022-13:55:48] [I] Safe mode: Disabled
[01/04/2022-13:55:48] [I] Restricted mode: Disabled
[01/04/2022-13:55:48] [I] Save engine: 
[01/04/2022-13:55:48] [I] Load engine: /mnt/dev/deepstream_lpr_app/models/LP/LPR/custom_lprnet.engine
[01/04/2022-13:55:48] [I] NVTX verbosity: 0
[01/04/2022-13:55:48] [I] Tactic sources: Using default tactic sources
[01/04/2022-13:55:48] [I] timingCacheMode: local
[01/04/2022-13:55:48] [I] timingCacheFile: 
[01/04/2022-13:55:48] [I] Input(s)s format: fp32:CHW
[01/04/2022-13:55:48] [I] Output(s)s format: fp32:CHW
[01/04/2022-13:55:48] [I] Input build shapes: model
[01/04/2022-13:55:48] [I] Input calibration shapes: model
[01/04/2022-13:55:48] [I] === System Options ===
[01/04/2022-13:55:48] [I] Device: 0
[01/04/2022-13:55:48] [I] DLACore: 
[01/04/2022-13:55:48] [I] Plugins:
[01/04/2022-13:55:48] [I] === Inference Options ===
[01/04/2022-13:55:48] [I] Batch: 1
[01/04/2022-13:55:48] [I] Input inference shapes: model
[01/04/2022-13:55:48] [I] Iterations: 10
[01/04/2022-13:55:48] [I] Duration: 3s (+ 200ms warm up)
[01/04/2022-13:55:48] [I] Sleep time: 0ms
[01/04/2022-13:55:48] [I] Streams: 1
[01/04/2022-13:55:48] [I] ExposeDMA: Disabled
[01/04/2022-13:55:48] [I] Data transfers: Enabled
[01/04/2022-13:55:48] [I] Spin-wait: Disabled
[01/04/2022-13:55:48] [I] Multithreading: Disabled
[01/04/2022-13:55:48] [I] CUDA Graph: Disabled
[01/04/2022-13:55:48] [I] Separate profiling: Disabled
[01/04/2022-13:55:48] [I] Time Deserialize: Disabled
[01/04/2022-13:55:48] [I] Time Refit: Disabled
[01/04/2022-13:55:48] [I] Skip inference: Disabled
[01/04/2022-13:55:48] [I] Inputs:
[01/04/2022-13:55:48] [I] === Reporting Options ===
[01/04/2022-13:55:48] [I] Verbose: Disabled
[01/04/2022-13:55:48] [I] Averages: 10 inferences
[01/04/2022-13:55:48] [I] Percentile: 99
[01/04/2022-13:55:48] [I] Dump refittable layers:Disabled
[01/04/2022-13:55:48] [I] Dump output: Disabled
[01/04/2022-13:55:48] [I] Profile: Disabled
[01/04/2022-13:55:48] [I] Export timing to JSON file: 
[01/04/2022-13:55:48] [I] Export output to JSON file: 
[01/04/2022-13:55:48] [I] Export profile to JSON file: 
[01/04/2022-13:55:48] [I] 
[01/04/2022-13:55:48] [I] === Device Information ===
[01/04/2022-13:55:48] [I] Selected Device: NVIDIA GeForce RTX 3060
[01/04/2022-13:55:48] [I] Compute Capability: 8.6
[01/04/2022-13:55:48] [I] SMs: 28
[01/04/2022-13:55:48] [I] Compute Clock Rate: 1.807 GHz
[01/04/2022-13:55:48] [I] Device Global Memory: 12045 MiB
[01/04/2022-13:55:48] [I] Shared Memory per SM: 100 KiB
[01/04/2022-13:55:48] [I] Memory Bus Width: 192 bits (ECC disabled)
[01/04/2022-13:55:48] [I] Memory Clock Rate: 7.501 GHz
[01/04/2022-13:55:48] [I] 
[01/04/2022-13:55:48] [I] TensorRT version: 8003
[01/04/2022-13:55:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +532, GPU +0, now: CPU 590, GPU 1053 (MiB)
[01/04/2022-13:55:48] [I] [TRT] Loaded engine size: 50 MB
[01/04/2022-13:55:48] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 590 MiB, GPU 1053 MiB
[01/04/2022-13:55:49] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +816, GPU +352, now: CPU 1411, GPU 1425 (MiB)
[01/04/2022-13:55:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +125, GPU +58, now: CPU 1536, GPU 1483 (MiB)
[01/04/2022-13:55:49] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1536, GPU 1465 (MiB)
[01/04/2022-13:55:49] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1536 MiB, GPU 1465 MiB
[01/04/2022-13:55:49] [I] Engine loaded in 1.00836 sec.
[01/04/2022-13:55:49] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 1486 MiB, GPU 1465 MiB
[01/04/2022-13:55:49] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1486, GPU 1475 (MiB)
[01/04/2022-13:55:49] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1486, GPU 1483 (MiB)
[01/04/2022-13:55:49] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1503 MiB, GPU 1543 MiB
[01/04/2022-13:55:49] [W] Dynamic dimensions required for input: image_input, but no shapes were provided. Automatically overriding shape to: 1x3x48x96
[01/04/2022-13:55:49] [I] Created input binding for image_input with dimensions 1x3x48x96
[01/04/2022-13:55:49] [I] Created output binding for tf_op_layer_ArgMax with dimensions 1x24
[01/04/2022-13:55:49] [I] Created output binding for tf_op_layer_Max with dimensions 1x24
[01/04/2022-13:55:49] [I] Starting inference
[01/04/2022-13:55:52] [I] Warmup completed 124 queries over 200 ms
[01/04/2022-13:55:52] [I] Timing trace has 2126 queries over 3.00386 s
[01/04/2022-13:55:52] [I] 
[01/04/2022-13:55:52] [I] === Trace details ===
[01/04/2022-13:55:52] [I] Trace averages of 10 runs:
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.52361 ms - Host latency: 1.54343 ms (end to end 2.84865 ms, enqueue 0.556534 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.52402 ms - Host latency: 1.54493 ms (end to end 2.85734 ms, enqueue 0.573395 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.52453 ms - Host latency: 1.54281 ms (end to end 2.88618 ms, enqueue 0.445921 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4306 ms - Host latency: 1.4415 ms (end to end 2.65231 ms, enqueue 0.152411 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41466 ms - Host latency: 1.42756 ms (end to end 2.66879 ms, enqueue 0.240454 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41404 ms - Host latency: 1.43551 ms (end to end 2.60792 ms, enqueue 0.602417 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41363 ms - Host latency: 1.43366 ms (end to end 2.625 ms, enqueue 0.580264 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41465 ms - Host latency: 1.43394 ms (end to end 2.65197 ms, enqueue 0.541055 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41404 ms - Host latency: 1.43371 ms (end to end 2.64316 ms, enqueue 0.526752 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41465 ms - Host latency: 1.43402 ms (end to end 2.64478 ms, enqueue 0.538739 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41415 ms - Host latency: 1.4316 ms (end to end 2.66984 ms, enqueue 0.511105 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41414 ms - Host latency: 1.43404 ms (end to end 2.62812 ms, enqueue 0.552277 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41435 ms - Host latency: 1.43251 ms (end to end 2.62904 ms, enqueue 0.52886 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41455 ms - Host latency: 1.43401 ms (end to end 2.63496 ms, enqueue 0.525006 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41404 ms - Host latency: 1.4328 ms (end to end 2.64493 ms, enqueue 0.539584 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41425 ms - Host latency: 1.43466 ms (end to end 2.65576 ms, enqueue 0.520151 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41128 ms - Host latency: 1.43232 ms (end to end 2.62676 ms, enqueue 0.545084 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.41145 ms - Host latency: 1.42874 ms (end to end 2.45858 ms, enqueue 0.579575 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.42514 ms (end to end 2.59145 ms, enqueue 0.607458 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4035 ms - Host latency: 1.4229 ms (end to end 2.62205 ms, enqueue 0.543011 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4038 ms - Host latency: 1.42228 ms (end to end 2.62235 ms, enqueue 0.497369 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4037 ms - Host latency: 1.42473 ms (end to end 2.63441 ms, enqueue 0.556192 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.42125 ms (end to end 2.64717 ms, enqueue 0.506567 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40472 ms - Host latency: 1.42545 ms (end to end 2.60491 ms, enqueue 0.547675 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40369 ms - Host latency: 1.4214 ms (end to end 2.6497 ms, enqueue 0.513043 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40421 ms - Host latency: 1.42256 ms (end to end 2.61591 ms, enqueue 0.476111 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40206 ms - Host latency: 1.4187 ms (end to end 2.65502 ms, enqueue 0.515918 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40278 ms - Host latency: 1.42303 ms (end to end 2.60965 ms, enqueue 0.564655 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4039 ms - Host latency: 1.42324 ms (end to end 2.62701 ms, enqueue 0.536389 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4033 ms - Host latency: 1.42289 ms (end to end 2.61636 ms, enqueue 0.548358 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4033 ms - Host latency: 1.42194 ms (end to end 2.614 ms, enqueue 0.534338 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40298 ms - Host latency: 1.42197 ms (end to end 2.63948 ms, enqueue 0.520679 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.46534 ms - Host latency: 1.48431 ms (end to end 2.68913 ms, enqueue 0.505548 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.43297 ms - Host latency: 1.4528 ms (end to end 2.70878 ms, enqueue 0.5367 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.44008 ms - Host latency: 1.45919 ms (end to end 2.68724 ms, enqueue 0.523395 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40258 ms - Host latency: 1.42194 ms (end to end 2.63815 ms, enqueue 0.527161 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40308 ms - Host latency: 1.42345 ms (end to end 2.61327 ms, enqueue 0.534009 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40319 ms - Host latency: 1.42061 ms (end to end 2.64171 ms, enqueue 0.509937 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40268 ms - Host latency: 1.42294 ms (end to end 2.61428 ms, enqueue 0.548669 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40369 ms - Host latency: 1.4222 ms (end to end 2.63947 ms, enqueue 0.511584 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40359 ms - Host latency: 1.42307 ms (end to end 2.60562 ms, enqueue 0.522986 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.42217 ms (end to end 2.61633 ms, enqueue 0.530145 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.404 ms - Host latency: 1.42402 ms (end to end 2.61804 ms, enqueue 0.527563 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40339 ms - Host latency: 1.42243 ms (end to end 2.62677 ms, enqueue 0.508484 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40369 ms - Host latency: 1.42205 ms (end to end 2.63448 ms, enqueue 0.494751 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4034 ms - Host latency: 1.42312 ms (end to end 2.60021 ms, enqueue 0.531445 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40212 ms - Host latency: 1.428 ms (end to end 2.56618 ms, enqueue 0.766156 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40237 ms - Host latency: 1.42233 ms (end to end 2.59388 ms, enqueue 0.580994 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40291 ms - Host latency: 1.42388 ms (end to end 2.58704 ms, enqueue 0.575366 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40301 ms - Host latency: 1.42303 ms (end to end 2.60557 ms, enqueue 0.551361 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40278 ms - Host latency: 1.42236 ms (end to end 2.58831 ms, enqueue 0.543933 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40339 ms - Host latency: 1.42427 ms (end to end 2.59467 ms, enqueue 0.540936 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40339 ms - Host latency: 1.42211 ms (end to end 2.61695 ms, enqueue 0.52951 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40257 ms - Host latency: 1.41936 ms (end to end 2.63641 ms, enqueue 0.367285 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40258 ms - Host latency: 1.42138 ms (end to end 2.56877 ms, enqueue 0.563348 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40328 ms - Host latency: 1.42236 ms (end to end 2.60413 ms, enqueue 0.524896 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.42205 ms (end to end 2.61113 ms, enqueue 0.530902 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4026 ms - Host latency: 1.42326 ms (end to end 2.45159 ms, enqueue 0.557129 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4041 ms - Host latency: 1.42417 ms (end to end 2.61801 ms, enqueue 0.523804 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40223 ms - Host latency: 1.42488 ms (end to end 2.56683 ms, enqueue 0.697522 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40239 ms - Host latency: 1.417 ms (end to end 2.66138 ms, enqueue 0.332202 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4021 ms - Host latency: 1.42363 ms (end to end 2.55023 ms, enqueue 0.618457 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40111 ms - Host latency: 1.42877 ms (end to end 2.53738 ms, enqueue 0.779419 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40214 ms - Host latency: 1.42345 ms (end to end 2.57217 ms, enqueue 0.640662 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40287 ms - Host latency: 1.42371 ms (end to end 2.58866 ms, enqueue 0.568079 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40338 ms - Host latency: 1.42299 ms (end to end 2.61147 ms, enqueue 0.54093 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40421 ms - Host latency: 1.4254 ms (end to end 2.58046 ms, enqueue 0.557495 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40328 ms - Host latency: 1.42014 ms (end to end 2.6377 ms, enqueue 0.494214 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4036 ms - Host latency: 1.42297 ms (end to end 2.59132 ms, enqueue 0.548181 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40371 ms - Host latency: 1.42395 ms (end to end 2.5864 ms, enqueue 0.568103 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4036 ms - Host latency: 1.42345 ms (end to end 2.58767 ms, enqueue 0.53916 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40359 ms - Host latency: 1.42371 ms (end to end 2.60289 ms, enqueue 0.551965 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4027 ms - Host latency: 1.41892 ms (end to end 2.62767 ms, enqueue 0.331873 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4027 ms - Host latency: 1.41844 ms (end to end 2.61064 ms, enqueue 0.32793 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40219 ms - Host latency: 1.42112 ms (end to end 2.62864 ms, enqueue 0.477124 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.49924 ms - Host latency: 1.51393 ms (end to end 2.82109 ms, enqueue 0.286475 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40338 ms - Host latency: 1.4251 ms (end to end 2.57394 ms, enqueue 0.571936 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40348 ms - Host latency: 1.42274 ms (end to end 2.59863 ms, enqueue 0.549658 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40205 ms - Host latency: 1.42134 ms (end to end 2.61382 ms, enqueue 0.529626 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40248 ms - Host latency: 1.42029 ms (end to end 2.59882 ms, enqueue 0.416809 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40317 ms - Host latency: 1.4224 ms (end to end 2.5925 ms, enqueue 0.546875 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40194 ms - Host latency: 1.42065 ms (end to end 2.6297 ms, enqueue 0.470581 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40258 ms - Host latency: 1.41631 ms (end to end 2.63195 ms, enqueue 0.218396 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40118 ms - Host latency: 1.4243 ms (end to end 2.57117 ms, enqueue 0.604883 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40389 ms - Host latency: 1.42338 ms (end to end 2.5916 ms, enqueue 0.526709 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40258 ms - Host latency: 1.42162 ms (end to end 2.61622 ms, enqueue 0.535132 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40238 ms - Host latency: 1.41622 ms (end to end 2.64202 ms, enqueue 0.241272 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40287 ms - Host latency: 1.42396 ms (end to end 2.58872 ms, enqueue 0.552319 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40349 ms - Host latency: 1.42225 ms (end to end 2.65262 ms, enqueue 0.487732 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40339 ms - Host latency: 1.41895 ms (end to end 2.6262 ms, enqueue 0.308472 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40145 ms - Host latency: 1.41266 ms (end to end 2.71826 ms, enqueue 0.158215 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40328 ms - Host latency: 1.41553 ms (end to end 2.67394 ms, enqueue 0.157104 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4027 ms - Host latency: 1.41378 ms (end to end 2.68374 ms, enqueue 0.156323 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.4255 ms (end to end 2.56459 ms, enqueue 0.690588 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40245 ms - Host latency: 1.42236 ms (end to end 2.62078 ms, enqueue 0.58523 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40289 ms - Host latency: 1.42181 ms (end to end 2.6411 ms, enqueue 0.519287 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40228 ms - Host latency: 1.42278 ms (end to end 2.58468 ms, enqueue 0.510547 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40237 ms - Host latency: 1.42169 ms (end to end 2.60883 ms, enqueue 0.510461 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40363 ms - Host latency: 1.42533 ms (end to end 2.589 ms, enqueue 0.542615 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40309 ms - Host latency: 1.42146 ms (end to end 2.63153 ms, enqueue 0.529297 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40259 ms - Host latency: 1.42302 ms (end to end 2.58875 ms, enqueue 0.549255 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4028 ms - Host latency: 1.42106 ms (end to end 2.62828 ms, enqueue 0.537695 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40254 ms - Host latency: 1.42299 ms (end to end 2.60973 ms, enqueue 0.548267 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4038 ms - Host latency: 1.42653 ms (end to end 2.61033 ms, enqueue 0.546021 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40411 ms - Host latency: 1.42461 ms (end to end 2.58187 ms, enqueue 0.540332 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40295 ms - Host latency: 1.42328 ms (end to end 2.58802 ms, enqueue 0.529358 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40281 ms - Host latency: 1.42255 ms (end to end 2.59298 ms, enqueue 0.535425 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4035 ms - Host latency: 1.42384 ms (end to end 2.59165 ms, enqueue 0.565894 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4033 ms - Host latency: 1.42246 ms (end to end 2.60692 ms, enqueue 0.526209 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40287 ms - Host latency: 1.42374 ms (end to end 2.61194 ms, enqueue 0.542615 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40271 ms - Host latency: 1.42264 ms (end to end 2.60477 ms, enqueue 0.542212 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40432 ms - Host latency: 1.42749 ms (end to end 2.57422 ms, enqueue 0.55448 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40277 ms - Host latency: 1.42192 ms (end to end 2.59058 ms, enqueue 0.534204 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40331 ms - Host latency: 1.42294 ms (end to end 2.60044 ms, enqueue 0.53927 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40471 ms - Host latency: 1.42321 ms (end to end 2.58724 ms, enqueue 0.528369 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40298 ms - Host latency: 1.42396 ms (end to end 2.59023 ms, enqueue 0.56449 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40338 ms - Host latency: 1.42328 ms (end to end 2.59392 ms, enqueue 0.544116 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.42665 ms - Host latency: 1.44761 ms (end to end 2.66449 ms, enqueue 0.537891 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.44142 ms - Host latency: 1.45833 ms (end to end 2.70304 ms, enqueue 0.360889 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.42316 ms - Host latency: 1.43839 ms (end to end 2.67986 ms, enqueue 0.378345 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40341 ms - Host latency: 1.42261 ms (end to end 2.62876 ms, enqueue 0.562012 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40309 ms - Host latency: 1.42397 ms (end to end 2.59849 ms, enqueue 0.556494 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40356 ms - Host latency: 1.42253 ms (end to end 2.63525 ms, enqueue 0.5297 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40299 ms - Host latency: 1.42476 ms (end to end 2.59893 ms, enqueue 0.568396 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40167 ms - Host latency: 1.42063 ms (end to end 2.65686 ms, enqueue 0.535535 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40216 ms - Host latency: 1.42177 ms (end to end 2.60592 ms, enqueue 0.545752 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40315 ms - Host latency: 1.42297 ms (end to end 2.6186 ms, enqueue 0.543494 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4032 ms - Host latency: 1.42352 ms (end to end 2.62719 ms, enqueue 0.542297 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40372 ms - Host latency: 1.42191 ms (end to end 2.63263 ms, enqueue 0.525073 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40287 ms - Host latency: 1.42124 ms (end to end 2.65 ms, enqueue 0.522424 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40393 ms - Host latency: 1.42543 ms (end to end 2.60392 ms, enqueue 0.57793 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40291 ms - Host latency: 1.42266 ms (end to end 2.64214 ms, enqueue 0.526343 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40269 ms - Host latency: 1.42278 ms (end to end 2.60693 ms, enqueue 0.522974 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40178 ms - Host latency: 1.42046 ms (end to end 2.61204 ms, enqueue 0.529907 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40305 ms - Host latency: 1.42419 ms (end to end 2.6292 ms, enqueue 0.546582 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4041 ms - Host latency: 1.42234 ms (end to end 2.63831 ms, enqueue 0.50752 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40308 ms - Host latency: 1.42427 ms (end to end 2.60071 ms, enqueue 0.561572 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40356 ms - Host latency: 1.42153 ms (end to end 2.66311 ms, enqueue 0.503613 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40217 ms - Host latency: 1.42156 ms (end to end 2.63704 ms, enqueue 0.549146 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40349 ms - Host latency: 1.42227 ms (end to end 2.59788 ms, enqueue 0.524805 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40276 ms - Host latency: 1.42307 ms (end to end 2.58684 ms, enqueue 0.533398 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40291 ms - Host latency: 1.42302 ms (end to end 2.58499 ms, enqueue 0.556152 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40198 ms - Host latency: 1.42073 ms (end to end 2.61875 ms, enqueue 0.512427 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40349 ms - Host latency: 1.42307 ms (end to end 2.59243 ms, enqueue 0.535156 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.42288 ms (end to end 2.61938 ms, enqueue 0.527295 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40261 ms - Host latency: 1.42253 ms (end to end 2.63596 ms, enqueue 0.534766 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40298 ms - Host latency: 1.42178 ms (end to end 2.596 ms, enqueue 0.526123 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40239 ms - Host latency: 1.42307 ms (end to end 2.58572 ms, enqueue 0.573657 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40361 ms - Host latency: 1.42229 ms (end to end 2.59229 ms, enqueue 0.515503 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40356 ms - Host latency: 1.424 ms (end to end 2.59321 ms, enqueue 0.543066 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.403 ms - Host latency: 1.42192 ms (end to end 2.5928 ms, enqueue 0.53916 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40425 ms - Host latency: 1.42507 ms (end to end 2.58977 ms, enqueue 0.547998 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40334 ms - Host latency: 1.42246 ms (end to end 2.6106 ms, enqueue 0.544678 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40415 ms - Host latency: 1.42456 ms (end to end 2.58586 ms, enqueue 0.545532 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40325 ms - Host latency: 1.42178 ms (end to end 2.60969 ms, enqueue 0.534766 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40269 ms - Host latency: 1.42256 ms (end to end 2.57046 ms, enqueue 0.555444 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40256 ms - Host latency: 1.42268 ms (end to end 2.60813 ms, enqueue 0.538013 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40371 ms - Host latency: 1.4239 ms (end to end 2.59775 ms, enqueue 0.558667 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40215 ms - Host latency: 1.42258 ms (end to end 2.60164 ms, enqueue 0.548755 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40215 ms - Host latency: 1.42051 ms (end to end 2.61775 ms, enqueue 0.524731 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.47905 ms - Host latency: 1.49812 ms (end to end 2.76111 ms, enqueue 0.557422 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40288 ms - Host latency: 1.41843 ms (end to end 2.65579 ms, enqueue 0.350513 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.42754 ms - Host latency: 1.44521 ms (end to end 2.69395 ms, enqueue 0.556152 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40229 ms - Host latency: 1.42012 ms (end to end 2.61951 ms, enqueue 0.536816 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40176 ms - Host latency: 1.42209 ms (end to end 2.57751 ms, enqueue 0.579175 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40278 ms - Host latency: 1.42271 ms (end to end 2.60967 ms, enqueue 0.549585 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40259 ms - Host latency: 1.42336 ms (end to end 2.60342 ms, enqueue 0.570972 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40261 ms - Host latency: 1.42209 ms (end to end 2.62512 ms, enqueue 0.538379 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40374 ms - Host latency: 1.42422 ms (end to end 2.58088 ms, enqueue 0.557397 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40327 ms - Host latency: 1.42136 ms (end to end 2.59333 ms, enqueue 0.524097 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40171 ms - Host latency: 1.42656 ms (end to end 2.56916 ms, enqueue 0.742432 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40151 ms - Host latency: 1.42754 ms (end to end 2.56687 ms, enqueue 0.762646 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.403 ms - Host latency: 1.42468 ms (end to end 2.57815 ms, enqueue 0.624243 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40359 ms - Host latency: 1.42251 ms (end to end 2.60417 ms, enqueue 0.547314 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40317 ms - Host latency: 1.42185 ms (end to end 2.60415 ms, enqueue 0.504248 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40261 ms - Host latency: 1.422 ms (end to end 2.58245 ms, enqueue 0.557886 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40298 ms - Host latency: 1.42329 ms (end to end 2.58735 ms, enqueue 0.547534 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40347 ms - Host latency: 1.42209 ms (end to end 2.62312 ms, enqueue 0.53147 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40303 ms - Host latency: 1.42261 ms (end to end 2.5959 ms, enqueue 0.546533 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40239 ms - Host latency: 1.42285 ms (end to end 2.60532 ms, enqueue 0.548218 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40396 ms - Host latency: 1.42395 ms (end to end 2.59858 ms, enqueue 0.53186 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40349 ms - Host latency: 1.42195 ms (end to end 2.63906 ms, enqueue 0.539575 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40359 ms - Host latency: 1.42207 ms (end to end 2.60544 ms, enqueue 0.536597 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40339 ms - Host latency: 1.42371 ms (end to end 2.60474 ms, enqueue 0.560303 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40325 ms - Host latency: 1.42205 ms (end to end 2.6175 ms, enqueue 0.525464 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40334 ms - Host latency: 1.42515 ms (end to end 2.60935 ms, enqueue 0.560791 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40222 ms - Host latency: 1.4217 ms (end to end 2.60554 ms, enqueue 0.536035 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40315 ms - Host latency: 1.42397 ms (end to end 2.59907 ms, enqueue 0.561597 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40432 ms - Host latency: 1.42173 ms (end to end 2.67898 ms, enqueue 0.371045 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40256 ms - Host latency: 1.42153 ms (end to end 2.58767 ms, enqueue 0.571704 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40215 ms - Host latency: 1.42283 ms (end to end 2.59065 ms, enqueue 0.568237 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40408 ms - Host latency: 1.42458 ms (end to end 2.59856 ms, enqueue 0.552637 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40283 ms - Host latency: 1.42222 ms (end to end 2.59338 ms, enqueue 0.546387 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40281 ms - Host latency: 1.42188 ms (end to end 2.61262 ms, enqueue 0.522339 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40332 ms - Host latency: 1.42249 ms (end to end 2.64355 ms, enqueue 0.534668 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40222 ms - Host latency: 1.41802 ms (end to end 2.65103 ms, enqueue 0.414746 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40198 ms - Host latency: 1.41863 ms (end to end 2.61353 ms, enqueue 0.332886 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40115 ms - Host latency: 1.42163 ms (end to end 2.58713 ms, enqueue 0.602564 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40369 ms - Host latency: 1.42478 ms (end to end 2.58865 ms, enqueue 0.556812 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40366 ms - Host latency: 1.42246 ms (end to end 2.61631 ms, enqueue 0.524145 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40273 ms - Host latency: 1.42275 ms (end to end 2.60801 ms, enqueue 0.549048 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40239 ms - Host latency: 1.42102 ms (end to end 2.63926 ms, enqueue 0.541919 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40327 ms - Host latency: 1.42292 ms (end to end 2.6273 ms, enqueue 0.47312 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4842 ms - Host latency: 1.49956 ms (end to end 2.81438 ms, enqueue 0.246948 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.42842 ms - Host latency: 1.44419 ms (end to end 2.67488 ms, enqueue 0.393359 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40271 ms - Host latency: 1.42319 ms (end to end 2.58096 ms, enqueue 0.614038 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40315 ms - Host latency: 1.42407 ms (end to end 2.6073 ms, enqueue 0.561768 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40198 ms - Host latency: 1.42153 ms (end to end 2.60586 ms, enqueue 0.553027 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.4031 ms - Host latency: 1.42317 ms (end to end 2.60791 ms, enqueue 0.544653 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40327 ms - Host latency: 1.42217 ms (end to end 2.60867 ms, enqueue 0.498169 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40291 ms - Host latency: 1.42261 ms (end to end 2.58145 ms, enqueue 0.478784 ms)
[01/04/2022-13:55:52] [I] Average on 10 runs - GPU latency: 1.40349 ms - Host latency: 1.42341 ms (end to end 2.59482 ms, enqueue 0.554199 ms)
[01/04/2022-13:55:52] [I] 
[01/04/2022-13:55:52] [I] === Performance summary ===
[01/04/2022-13:55:52] [I] Throughput: 707.757 qps
[01/04/2022-13:55:52] [I] Latency: min = 1.40955 ms, max = 1.85626 ms, mean = 1.42731 ms, median = 1.42383 ms, percentile(99%) = 1.54631 ms
[01/04/2022-13:55:52] [I] End-to-End Host Latency: min = 1.49722 ms, max = 3.36011 ms, mean = 2.61908 ms, median = 2.604 ms, percentile(99%) = 2.91623 ms
[01/04/2022-13:55:52] [I] Enqueue Time: min = 0.12532 ms, max = 0.989624 ms, mean = 0.521513 ms, median = 0.555664 ms, percentile(99%) = 0.809937 ms
[01/04/2022-13:55:52] [I] H2D Latency: min = 0.00549316 ms, max = 0.034668 ms, mean = 0.0144739 ms, median = 0.0161133 ms, percentile(99%) = 0.0234375 ms
[01/04/2022-13:55:52] [I] GPU Compute Time: min = 1.39868 ms, max = 1.83502 ms, mean = 1.40796 ms, median = 1.40283 ms, percentile(99%) = 1.52577 ms
[01/04/2022-13:55:52] [I] D2H Latency: min = 0.00346375 ms, max = 0.0275879 ms, mean = 0.00486725 ms, median = 0.00476074 ms, percentile(99%) = 0.00634766 ms
[01/04/2022-13:55:52] [I] Total Host Walltime: 3.00386 s
[01/04/2022-13:55:52] [I] Total GPU Compute Time: 2.99333 s
[01/04/2022-13:55:52] [I] Explanations of the performance metrics are printed in the verbose logs.
[01/04/2022-13:55:52] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8003] # ./trtexec --loadEngine=/mnt/dev/deepstream_lpr_app/models/LP/LPR/custom_lprnet.engine
[01/04/2022-13:55:52] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1486, GPU 1509 (MiB)

Morganh · January 4, 2022, 3:44pm

s.levsov:

 [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[01/04/2022-14:20:27] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

Above error log usually means that the tensorrt verison is different between the inference environment and tensorrt engine generation environment.
You can try to generate trt engine in 3.21.11 docker and then run inference in 3.21.11 docker.

s.levsov · January 5, 2022, 6:09am

When I run docker I get this message: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
Not sure how to check full info on it.

One thing though, I run inference on my Ubuntu 20.04 PC with RTX 3060 on it , not in docker.
And if I try to use engine in deepstream_lpr_app it loads and works as intended.

$ nvidia-smi 
Wed Jan  5 08:06:01 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   44C    P8    15W / 170W |    322MiB / 12045MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1092      G   /usr/lib/xorg/Xorg                 35MiB |
|    0   N/A  N/A      1745      G   /usr/lib/xorg/Xorg                 91MiB |
|    0   N/A  N/A      1878      G   /usr/bin/gnome-shell               90MiB |
|    0   N/A  N/A      4200      G   ...AAAAAAAAA= --shared-files       94MiB |
+-----------------------------------------------------------------------------+

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0

$ dpkg -l | grep TensorRT
ii  graphsurgeon-tf                                             8.0.1-1+cuda11.3                      amd64        GraphSurgeon for TensorRT package
ii  libnvinfer-bin                                              8.0.1-1+cuda11.3                      amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.0.1-1+cuda11.3                      amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.0.1-1+cuda11.3                      all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.0.1-1+cuda11.3                      all          TensorRT samples
ii  libnvinfer8                                                 8.0.1-1+cuda11.3                      amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  onnx-graphsurgeon                                           8.0.1-1+cuda11.3                      amd64        ONNX GraphSurgeon for TensorRT package

Morganh · January 5, 2022, 6:17am

May I know where did you run below command, inside docker or outside docker?
$ ./tao-converter custom_lprnet.etlt -k nvidia_tlt -p image_input,1x3x48x96,4x3x48x96,16x3x48x96 -t fp16 -e custom_lprnet.engine

s.levsov · January 5, 2022, 6:20am

Outside of it.

Morganh · January 5, 2022, 6:21am

s.levsov:

[01/04/2022-14:20:27] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::35] Error Code 1: Serialization (Serialization assertion safeVersionRead == safeSerializationVersion failed.Version tag does not match. Note: Current Version: 0, Serialized Engine Version: 43)
[01/04/2022-14:20:27] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)

How did you get above error log? Could you share the full command and full log?

s.levsov · January 5, 2022, 6:29am

I get this message if I run this code in non-virtual environment:

def load_engine(trt_runtime, engine_path):
    with open(engine_path, 'rb') as f:
        engine_data = f.read()
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine


TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
trt_engine_path = '/mnt/dev/deepstream_lpr_app/models/LP/LPR/custom_lprnet.engine'
trt_engine = load_engine(trt_runtime, trt_engine_path)

if trt_engine is not None:
    print("Success")
else:
    print("Failed")

Morganh · January 5, 2022, 6:32am

OK, you can run tao-converter to generate a new tensorrt engine again under non-virtual environment.
Then please check again.

s.levsov · January 5, 2022, 6:33am

That’s exactly what I did before. For the current project I don’t use venv.

Morganh · January 5, 2022, 6:40am

For your current non-virtual environment, could you run below and share the result?
$ dpkg -l |grep cuda

s.levsov · January 5, 2022, 6:42am

$ dpkg -l |grep cuda
ii  cuda                                                        11.5.1-1                              amd64        CUDA meta-package
ii  cuda-11-5                                                   11.5.1-1                              amd64        CUDA 11.5 meta-package
ii  cuda-cccl-11-4                                              11.4.122-1                            amd64        CUDA CCCL
ii  cuda-cccl-11-5                                              11.5.62-1                             amd64        CUDA CCCL
ii  cuda-command-line-tools-11-5                                11.5.1-1                              amd64        CUDA command-line tools
ii  cuda-compiler-11-5                                          11.5.1-1                              amd64        CUDA compiler
ii  cuda-cudart-11-3                                            11.3.109-1                            amd64        CUDA Runtime native Libraries
ii  cuda-cudart-11-4                                            11.4.148-1                            amd64        CUDA Runtime native Libraries
ii  cuda-cudart-11-5                                            11.5.117-1                            amd64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-11-3                                        11.3.109-1                            amd64        CUDA Runtime native dev links, headers
ii  cuda-cudart-dev-11-4                                        11.4.148-1                            amd64        CUDA Runtime native dev links, headers
ii  cuda-cudart-dev-11-5                                        11.5.117-1                            amd64        CUDA Runtime native dev links, headers
ii  cuda-cuobjdump-11-5                                         11.5.119-1                            amd64        CUDA cuobjdump
ii  cuda-cupti-11-5                                             11.5.114-1                            amd64        CUDA profiling tools runtime libs.
ii  cuda-cupti-dev-11-5                                         11.5.114-1                            amd64        CUDA profiling tools interface.
ii  cuda-cuxxfilt-11-5                                          11.5.119-1                            amd64        CUDA cuxxfilt
ii  cuda-demo-suite-11-5                                        11.5.50-1                             amd64        Demo suite for CUDA
ii  cuda-documentation-11-5                                     11.5.114-1                            amd64        CUDA documentation
ii  cuda-driver-dev-11-3                                        11.3.109-1                            amd64        CUDA Driver native dev stub library
ii  cuda-driver-dev-11-4                                        11.4.148-1                            amd64        CUDA Driver native dev stub library
ii  cuda-driver-dev-11-5                                        11.5.117-1                            amd64        CUDA Driver native dev stub library
ii  cuda-drivers                                                495.29.05-1                           amd64        CUDA Driver meta-package, branch-agnostic
ii  cuda-drivers-495                                            495.29.05-1                           amd64        CUDA Driver meta-package, branch-specific
ii  cuda-gdb-11-5                                               11.5.114-1                            amd64        CUDA-GDB
ii  cuda-libraries-11-5                                         11.5.1-1                              amd64        CUDA Libraries 11.5 meta-package
ii  cuda-libraries-dev-11-5                                     11.5.1-1                              amd64        CUDA Libraries 11.5 development meta-package
ii  cuda-memcheck-11-5                                          11.5.114-1                            amd64        CUDA-MEMCHECK
ii  cuda-nsight-11-5                                            11.5.114-1                            amd64        CUDA nsight
ii  cuda-nsight-compute-11-5                                    11.5.1-1                              amd64        NVIDIA Nsight Compute
ii  cuda-nsight-systems-11-5                                    11.5.1-1                              amd64        NVIDIA Nsight Systems
ii  cuda-nvcc-11-3                                              11.3.109-1                            amd64        CUDA nvcc
ii  cuda-nvcc-11-5                                              11.5.119-1                            amd64        CUDA nvcc
ii  cuda-nvdisasm-11-5                                          11.5.119-1                            amd64        CUDA disassembler
ii  cuda-nvml-dev-11-5                                          11.5.50-1                             amd64        NVML native dev links, headers
ii  cuda-nvprof-11-5                                            11.5.114-1                            amd64        CUDA Profiler tools
ii  cuda-nvprune-11-5                                           11.5.119-1                            amd64        CUDA nvprune
ii  cuda-nvrtc-11-3                                             11.3.109-1                            amd64        NVRTC native runtime libraries
ii  cuda-nvrtc-11-5                                             11.5.119-1                            amd64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-11-3                                         11.3.109-1                            amd64        NVRTC native dev links, headers
ii  cuda-nvrtc-dev-11-5                                         11.5.119-1                            amd64        NVRTC native dev links, headers
ii  cuda-nvtx-11-5                                              11.5.114-1                            amd64        NVIDIA Tools Extension
ii  cuda-nvvp-11-5                                              11.5.114-1                            amd64        CUDA Profiler tools
ii  cuda-repo-ubuntu2004-11-4-local                             11.4.1-470.57.02-1                    amd64        cuda repository configuration files
ii  cuda-repo-ubuntu2004-11-5-local                             11.5.0-495.29.05-1                    amd64        cuda repository configuration files
ii  cuda-runtime-11-5                                           11.5.1-1                              amd64        CUDA Runtime 11.5 meta-package
ii  cuda-samples-11-5                                           11.5.56-1                             amd64        CUDA example applications
ii  cuda-sanitizer-11-5                                         11.5.114-1                            amd64        CUDA Sanitizer
ii  cuda-thrust-11-3                                            11.3.109-1                            amd64        CUDA Thrust
ii  cuda-toolkit-11-3-config-common                             11.3.109-1                            all          Common config package for CUDA Toolkit 11.3.
ii  cuda-toolkit-11-4-config-common                             11.4.148-1                            all          Common config package for CUDA Toolkit 11.4.
ii  cuda-toolkit-11-5                                           11.5.1-1                              amd64        CUDA Toolkit 11.5 meta-package
ii  cuda-toolkit-11-5-config-common                             11.5.117-1                            all          Common config package for CUDA Toolkit 11.5.
ii  cuda-toolkit-11-config-common                               11.5.117-1                            all          Common config package for CUDA Toolkit 11.
ii  cuda-toolkit-config-common                                  11.5.117-1                            all          Common config package for CUDA Toolkit.
ii  cuda-tools-11-5                                             11.5.1-1                              amd64        CUDA Tools meta-package
ii  cuda-visual-tools-11-5                                      11.5.1-1                              amd64        CUDA visual tools
ii  graphsurgeon-tf                                             8.0.1-1+cuda11.3                      amd64        GraphSurgeon for TensorRT package
ii  libcudart10.1:amd64                                         10.1.243-3                            amd64        NVIDIA CUDA Runtime Library
ii  libcudnn8                                                   8.3.1.22-1+cuda11.5                   amd64        cuDNN runtime libraries
ii  libcudnn8-dev                                               8.3.1.22-1+cuda11.5                   amd64        cuDNN development libraries and headers
ii  libcudnn8-samples                                           8.3.0.98-1+cuda11.5                   amd64        cuDNN documents and samples
ii  libnccl-dev                                                 2.9.9-1+cuda11.3                      amd64        NVIDIA Collective Communication Library (NCCL) Development Files
ii  libnccl2                                                    2.9.9-1+cuda11.3                      amd64        NVIDIA Collective Communication Library (NCCL) Runtime
ii  libnvinfer-bin                                              8.0.1-1+cuda11.3                      amd64        TensorRT binaries
ii  libnvinfer-dev                                              8.0.1-1+cuda11.3                      amd64        TensorRT development libraries and headers
ii  libnvinfer-doc                                              8.0.1-1+cuda11.3                      all          TensorRT documentation
ii  libnvinfer-plugin-dev                                       8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-plugin8                                          8.0.1-1+cuda11.3                      amd64        TensorRT plugin libraries
ii  libnvinfer-samples                                          8.0.1-1+cuda11.3                      all          TensorRT samples
ii  libnvinfer8                                                 8.0.1-1+cuda11.3                      amd64        TensorRT runtime libraries
ii  libnvonnxparsers-dev                                        8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvonnxparsers8                                           8.0.1-1+cuda11.3                      amd64        TensorRT ONNX libraries
ii  libnvparsers-dev                                            8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  libnvparsers8                                               8.0.1-1+cuda11.3                      amd64        TensorRT parsers libraries
ii  nccl-local-repo-ubuntu2004-2.11.4-cuda11.0                  1.0-1                                 amd64        nccl-local repository configuration files
ii  nv-tensorrt-repo-ubuntu1804-cuda11.3-trt8.0.1.6-ga-20210626 1-1                                   amd64        nv-tensorrt repository configuration files
ii  nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.1.8-ga-20211117 1-1                                   amd64        nv-tensorrt repository configuration files
ii  nvidia-cuda-dev                                             10.1.243-3                            amd64        NVIDIA CUDA development files
ii  nvidia-cuda-doc                                             10.1.243-3                            all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                                             10.1.243-3                            amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                                         10.1.243-3                            amd64        NVIDIA CUDA development toolkit
ii  onnx-graphsurgeon                                           8.0.1-1+cuda11.3                      amd64        ONNX GraphSurgeon for TensorRT package

Morganh · January 5, 2022, 7:20am

To narrow down, under non-virtual environment, can you follow GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream to deploy your custom_lprnet.engine ?

s.levsov · January 5, 2022, 7:29am

Correct! I can.

s.levsov · January 5, 2022, 12:26pm

But it’s not what I need. I don’t want to use DeepStream nor GStreamer.

Morganh · January 5, 2022, 1:08pm

For other inference ways, refer to GitHub - NVIDIA-AI-IOT/tao-toolkit-triton-apps at dev-morgan/add-lprnet-triton or Python run LPRNet with TensorRT show pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory - #8 by Morganh or Not Getting Correct output while running inference using TensorRT on LPRnet fp16 Model

system · January 19, 2022, 1:09pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trt with batch TensorRT	4	629	July 27, 2022
Request for trtexec ouput explaination Jetson AGX Xavier tensorrt	5	1823	August 11, 2021
Tensorrt Inference Segmentation fault TensorRT tensorrt , cudnn	6	325	June 5, 2024
What is the inference speed? TAO Toolkit	3	672	December 21, 2021
TensorRT inference process TensorRT	4	636	May 17, 2021
deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match TensorRT	4	2866	April 22, 2024
No performance difference between Float16 and Float32 optimized TensorRT models Jetson AGX Xavier tensorrt	4	2934	October 10, 2021
AssertionError: Max workspace size for TensorRT inference should be positive, got 0 TensorRT	4	730	July 21, 2021
trtexec set input shape not working with TensorRT	2	5471	August 5, 2021
DW_DNN_INVALID_MODEL error for trt model (isPointPillarNet \| NVIDIA NGC) TAO Toolkit tensorrt , driveworks , onnx	6	31	February 12, 2025

LPRNet can’t use exported engine file

Related topics