deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match

Description

When I try to deserialize trt engine file, I get the follow error message.

ERROR: 1: [stdArchiveReader.cpp::StdArchiveReader::32] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)

the trt engine file is generated from onnx using trtexec.

the onnx model file is generated from yolov5s.pt using the file export.py in the yolov5

I also tried to generate using the tensorrt’s api.It does not work. i got same error.

IBuilder *builder = createInferBuilder(logger);
    INetworkDefinition *network = builder->createNetworkV2(1U << int(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
    IBuilderConfig *config = builder->createBuilderConfig();
    IRuntime *runtime = createInferRuntime(logger);
    ICudaEngine *engine = nullptr;
    IParser *parser = createParser(*network, logger);
    assert(parser->parseFromFile(onnxFile.c_str(), int(logger.reportableSeverity)));

    cout << "Succeeded parsing from onnx file \'" << onnxFile << "\'" << endl;

    IHostMemory *engineString = builder->buildSerializedNetwork(*network, *config);
    assert(engineString != nullptr && engineString->size() != 0);
    cout << "Succeeded building serialized engine!" << endl;

    ofstream engineFile(trtFile, ios_base::binary | ios_base::out);
    assert(!engineFile.fail());
    engineFile.write(static_cast<char *>(engineString->data()), engineString->size());

    engineFile.close();

the souce code I tried to deserialize trt engine file:

ifstream file(trtFile, ios_base::in |ios_base::binary);
    assert(file.good());
    file.seekg(0, ios::end);
    auto size = file.tellg();
    char *engineString = new char[size];
    assert(engineString);
    file.read(engineString, size);
    file.close();
    // initLibNvInferPlugins(this->logger, "");
    this->runtime = createInferRuntime(*this->logger);
    assert(this->runtime);

    this->engine = this->runtime->deserializeCudaEngine((void*)engineString, size);
    assert(this->engine);

Environment

cuda cudnn and tensorrt I use come from developer.download.nvidia.cn
I have check all the version of meta packages required by tensorrt, cuda, and cudnn.
TensorRT Version: 8.5.3.1-1+cuda11.8
GPU: 3060ti
Nvidia Driver Version: nvidia-driver-530 530.30.02-0ubuntu1 amd64
CUDA Version: cuda-drivers-530 530.30.02-1 amd64 cuda-driver-dev-11-8 11.8.89-1 amd64
CUDNN Version: libcudnn8 8.8.1.3-1+cuda11.8 amd64
Operating System + Version: ubuntu18 lts (up to date)
Python Version: 3.7
PyTorch Version: 1.13.1

Relevant Files

  1. the onnx file I generated :yolov5s.onnx (14.0 MB)
  2. the trt engine generated from trtexec:yolov5s_trtexec.trt (31.9 MB)
  3. the trt engine file geberate from tensorrt’s api:yolov5s.trt (32.1 MB)

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

The onnx model I have been shared in previous message.

  1. I tried check_model.py,but there are not any outputs.
  2. there is the output of trtexec :
&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # ./trtexec --onnx=./yolov5s.onnx --saveEngine=y.trt
[03/25/2023-18:55:58] [I] === Model Options ===
[03/25/2023-18:55:58] [I] Format: ONNX
[03/25/2023-18:55:58] [I] Model: ./yolov5s.onnx
[03/25/2023-18:55:58] [I] Output:
[03/25/2023-18:55:58] [I] === Build Options ===
[03/25/2023-18:55:58] [I] Max batch: explicit batch
[03/25/2023-18:55:58] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/25/2023-18:55:58] [I] minTiming: 1
[03/25/2023-18:55:58] [I] avgTiming: 8
[03/25/2023-18:55:58] [I] Precision: FP32
[03/25/2023-18:55:58] [I] LayerPrecisions: 
[03/25/2023-18:55:58] [I] Calibration: 
[03/25/2023-18:55:58] [I] Refit: Disabled
[03/25/2023-18:55:58] [I] Sparsity: Disabled
[03/25/2023-18:55:58] [I] Safe mode: Disabled
[03/25/2023-18:55:58] [I] DirectIO mode: Disabled
[03/25/2023-18:55:58] [I] Restricted mode: Disabled
[03/25/2023-18:55:58] [I] Build only: Disabled
[03/25/2023-18:55:58] [I] Save engine: y.trt
[03/25/2023-18:55:58] [I] Load engine: 
[03/25/2023-18:55:58] [I] Profiling verbosity: 0
[03/25/2023-18:55:58] [I] Tactic sources: Using default tactic sources
[03/25/2023-18:55:58] [I] timingCacheMode: local
[03/25/2023-18:55:58] [I] timingCacheFile: 
[03/25/2023-18:55:58] [I] Heuristic: Disabled
[03/25/2023-18:55:58] [I] Preview Features: Use default preview flags.
[03/25/2023-18:55:58] [I] Input(s)s format: fp32:CHW
[03/25/2023-18:55:58] [I] Output(s)s format: fp32:CHW
[03/25/2023-18:55:58] [I] Input build shapes: model
[03/25/2023-18:55:58] [I] Input calibration shapes: model
[03/25/2023-18:55:58] [I] === System Options ===
[03/25/2023-18:55:58] [I] Device: 0
[03/25/2023-18:55:58] [I] DLACore: 
[03/25/2023-18:55:58] [I] Plugins:
[03/25/2023-18:55:58] [I] === Inference Options ===
[03/25/2023-18:55:58] [I] Batch: Explicit
[03/25/2023-18:55:58] [I] Input inference shapes: model
[03/25/2023-18:55:58] [I] Iterations: 10
[03/25/2023-18:55:58] [I] Duration: 3s (+ 200ms warm up)
[03/25/2023-18:55:58] [I] Sleep time: 0ms
[03/25/2023-18:55:58] [I] Idle time: 0ms
[03/25/2023-18:55:58] [I] Streams: 1
[03/25/2023-18:55:58] [I] ExposeDMA: Disabled
[03/25/2023-18:55:58] [I] Data transfers: Enabled
[03/25/2023-18:55:58] [I] Spin-wait: Disabled
[03/25/2023-18:55:58] [I] Multithreading: Disabled
[03/25/2023-18:55:58] [I] CUDA Graph: Disabled
[03/25/2023-18:55:58] [I] Separate profiling: Disabled
[03/25/2023-18:55:58] [I] Time Deserialize: Disabled
[03/25/2023-18:55:58] [I] Time Refit: Disabled
[03/25/2023-18:55:58] [I] NVTX verbosity: 0
[03/25/2023-18:55:58] [I] Persistent Cache Ratio: 0
[03/25/2023-18:55:58] [I] Inputs:
[03/25/2023-18:55:58] [I] === Reporting Options ===
[03/25/2023-18:55:58] [I] Verbose: Disabled
[03/25/2023-18:55:58] [I] Averages: 10 inferences
[03/25/2023-18:55:58] [I] Percentiles: 90,95,99
[03/25/2023-18:55:58] [I] Dump refittable layers:Disabled
[03/25/2023-18:55:58] [I] Dump output: Disabled
[03/25/2023-18:55:58] [I] Profile: Disabled
[03/25/2023-18:55:58] [I] Export timing to JSON file: 
[03/25/2023-18:55:58] [I] Export output to JSON file: 
[03/25/2023-18:55:58] [I] Export profile to JSON file: 
[03/25/2023-18:55:58] [I] 
[03/25/2023-18:55:58] [I] === Device Information ===
[03/25/2023-18:55:58] [I] Selected Device: NVIDIA GeForce RTX 3060 Ti
[03/25/2023-18:55:58] [I] Compute Capability: 8.6
[03/25/2023-18:55:58] [I] SMs: 38
[03/25/2023-18:55:58] [I] Compute Clock Rate: 1.755 GHz
[03/25/2023-18:55:58] [I] Device Global Memory: 7965 MiB
[03/25/2023-18:55:58] [I] Shared Memory per SM: 100 KiB
[03/25/2023-18:55:58] [I] Memory Bus Width: 256 bits (ECC disabled)
[03/25/2023-18:55:58] [I] Memory Clock Rate: 7.001 GHz
[03/25/2023-18:55:58] [I] 
[03/25/2023-18:55:58] [I] TensorRT version: 8.5.3
[03/25/2023-18:55:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +571, GPU +0, now: CPU 584, GPU 669 (MiB)
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +542, GPU +116, now: CPU 1178, GPU 785 (MiB)
[03/25/2023-18:56:00] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/25/2023-18:56:00] [I] Start parsing network model
[03/25/2023-18:56:00] [I] [TRT] ----------------------------------------------------------------
[03/25/2023-18:56:00] [I] [TRT] Input filename:   ./yolov5s.onnx
[03/25/2023-18:56:00] [I] [TRT] ONNX IR version:  0.0.7
[03/25/2023-18:56:00] [I] [TRT] Opset version:    12
[03/25/2023-18:56:00] [I] [TRT] Producer name:    pytorch
[03/25/2023-18:56:00] [I] [TRT] Producer version: 1.12.1
[03/25/2023-18:56:00] [I] [TRT] Domain:           
[03/25/2023-18:56:00] [I] [TRT] Model version:    0
[03/25/2023-18:56:00] [I] [TRT] Doc string:       
[03/25/2023-18:56:00] [I] [TRT] ----------------------------------------------------------------
[03/25/2023-18:56:00] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/25/2023-18:56:00] [I] Finish parsing network model
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1287, GPU +362, now: CPU 2498, GPU 1147 (MiB)
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +246, GPU +58, now: CPU 2744, GPU 1205 (MiB)
[03/25/2023-18:56:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/25/2023-18:57:20] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[03/25/2023-18:57:20] [I] [TRT] Total Activation Memory: 8605349888
[03/25/2023-18:57:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[03/25/2023-18:57:20] [I] [TRT] Total Host Persistent Memory: 84176
[03/25/2023-18:57:20] [I] [TRT] Total Device Persistent Memory: 231936
[03/25/2023-18:57:20] [I] [TRT] Total Scratch Memory: 134217728
[03/25/2023-18:57:20] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 4367 MiB
[03/25/2023-18:57:20] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 195 steps to complete.
[03/25/2023-18:57:20] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 18.1328ms to assign 15 blocks to 195 nodes requiring 156228096 bytes.
[03/25/2023-18:57:20] [I] [TRT] Total Activation Memory: 156228096
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3594, GPU 1545 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +3, GPU +31, now: CPU 3, GPU 31 (MiB)
[03/25/2023-18:57:20] [I] Engine built in 81.9838 sec.
[03/25/2023-18:57:20] [I] [TRT] Loaded engine size: 31 MiB
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3067, GPU 1411 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +30, now: CPU 0, GPU 30 (MiB)
[03/25/2023-18:57:20] [I] Engine deserialized in 0.0126936 sec.
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3067, GPU 1411 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +149, now: CPU 0, GPU 179 (MiB)
[03/25/2023-18:57:20] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/25/2023-18:57:20] [I] Setting persistentCacheLimit to 0 bytes.
[03/25/2023-18:57:20] [I] Using random values for input onnx::Cast_0
[03/25/2023-18:57:20] [I] Created input binding for onnx::Cast_0 with dimensions 1x3x640x640
[03/25/2023-18:57:20] [I] Using random values for output 630
[03/25/2023-18:57:20] [I] Created output binding for 630 with dimensions 1x25200x85
[03/25/2023-18:57:20] [I] Starting inference
[03/25/2023-18:57:23] [I] Warmup completed 66 queries over 200 ms
[03/25/2023-18:57:23] [I] Timing trace has 1004 queries over 3.01039 s
[03/25/2023-18:57:23] [I] 
[03/25/2023-18:57:23] [I] === Trace details ===
[03/25/2023-18:57:23] [I] Trace averages of 10 runs:
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00268 ms - Host latency: 3.64388 ms (enqueue 1.35774 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00237 ms - Host latency: 3.64576 ms (enqueue 1.36225 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00421 ms - Host latency: 3.63826 ms (enqueue 1.35391 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00594 ms - Host latency: 3.62881 ms (enqueue 1.33792 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00575 ms - Host latency: 3.63828 ms (enqueue 1.36564 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98281 ms - Host latency: 3.6037 ms (enqueue 1.35683 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98025 ms - Host latency: 3.61531 ms (enqueue 1.34596 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.03319 ms - Host latency: 3.67264 ms (enqueue 1.34447 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.12074 ms - Host latency: 3.74781 ms (enqueue 1.39609 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98557 ms - Host latency: 3.62251 ms (enqueue 1.35918 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97882 ms - Host latency: 3.60628 ms (enqueue 1.35126 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98005 ms - Host latency: 3.60111 ms (enqueue 1.34577 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98351 ms - Host latency: 3.61732 ms (enqueue 1.34944 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9821 ms - Host latency: 3.61712 ms (enqueue 1.34304 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98332 ms - Host latency: 3.63346 ms (enqueue 1.35293 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98221 ms - Host latency: 3.61263 ms (enqueue 1.33514 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98353 ms - Host latency: 3.61492 ms (enqueue 1.33793 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98065 ms - Host latency: 3.61238 ms (enqueue 1.35081 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9823 ms - Host latency: 3.61409 ms (enqueue 1.34128 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98219 ms - Host latency: 3.60547 ms (enqueue 1.34345 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98014 ms - Host latency: 3.61153 ms (enqueue 1.34828 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98393 ms - Host latency: 3.62723 ms (enqueue 1.34292 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98301 ms - Host latency: 3.61342 ms (enqueue 1.34311 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98271 ms - Host latency: 3.60798 ms (enqueue 1.3453 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98066 ms - Host latency: 3.61592 ms (enqueue 1.35807 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9826 ms - Host latency: 3.59905 ms (enqueue 1.35049 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98168 ms - Host latency: 3.60786 ms (enqueue 1.34774 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02327 ms - Host latency: 3.65791 ms (enqueue 1.34291 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.13108 ms - Host latency: 3.76853 ms (enqueue 1.39824 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98169 ms - Host latency: 3.60955 ms (enqueue 1.34359 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98199 ms - Host latency: 3.63452 ms (enqueue 1.34982 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98077 ms - Host latency: 3.60051 ms (enqueue 1.34521 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98163 ms - Host latency: 3.60713 ms (enqueue 1.34738 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98363 ms - Host latency: 3.61511 ms (enqueue 1.34919 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98423 ms - Host latency: 3.6106 ms (enqueue 1.34349 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98262 ms - Host latency: 3.61565 ms (enqueue 1.33793 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98373 ms - Host latency: 3.61611 ms (enqueue 1.34597 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9823 ms - Host latency: 3.61178 ms (enqueue 1.35188 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97985 ms - Host latency: 3.5972 ms (enqueue 1.35369 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98279 ms - Host latency: 3.61038 ms (enqueue 1.34532 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98251 ms - Host latency: 3.60774 ms (enqueue 1.3387 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98199 ms - Host latency: 3.60171 ms (enqueue 1.32628 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98361 ms - Host latency: 3.62836 ms (enqueue 1.32795 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98353 ms - Host latency: 3.61538 ms (enqueue 1.33948 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97931 ms - Host latency: 3.61067 ms (enqueue 1.34053 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97964 ms - Host latency: 3.59666 ms (enqueue 1.34465 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98278 ms - Host latency: 3.61497 ms (enqueue 1.34336 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.0295 ms - Host latency: 3.65892 ms (enqueue 1.34763 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.09833 ms - Host latency: 3.72833 ms (enqueue 1.39119 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98231 ms - Host latency: 3.60315 ms (enqueue 1.35411 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98446 ms - Host latency: 3.61978 ms (enqueue 1.34379 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98309 ms - Host latency: 3.62393 ms (enqueue 1.34669 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98147 ms - Host latency: 3.61221 ms (enqueue 1.3424 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98341 ms - Host latency: 3.61759 ms (enqueue 1.34286 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98251 ms - Host latency: 3.61531 ms (enqueue 1.3425 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97858 ms - Host latency: 3.6001 ms (enqueue 1.35293 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98273 ms - Host latency: 3.6028 ms (enqueue 1.34374 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9835 ms - Host latency: 3.6109 ms (enqueue 1.34844 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98135 ms - Host latency: 3.61459 ms (enqueue 1.3537 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98126 ms - Host latency: 3.62198 ms (enqueue 1.34771 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98289 ms - Host latency: 3.61235 ms (enqueue 1.33959 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98165 ms - Host latency: 3.60614 ms (enqueue 1.34415 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98135 ms - Host latency: 3.6084 ms (enqueue 1.35081 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98179 ms - Host latency: 3.62878 ms (enqueue 1.31062 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9811 ms - Host latency: 3.60444 ms (enqueue 1.27759 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98188 ms - Host latency: 3.62107 ms (enqueue 1.28445 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98389 ms - Host latency: 3.61699 ms (enqueue 1.27905 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02979 ms - Host latency: 3.65498 ms (enqueue 1.27449 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.11414 ms - Host latency: 3.74292 ms (enqueue 1.29788 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.60635 ms (enqueue 1.36624 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98347 ms - Host latency: 3.60325 ms (enqueue 1.33906 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98401 ms - Host latency: 3.61831 ms (enqueue 1.34968 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98474 ms - Host latency: 3.61946 ms (enqueue 1.35396 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98296 ms - Host latency: 3.6145 ms (enqueue 1.34971 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98213 ms - Host latency: 3.60479 ms (enqueue 1.35981 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9842 ms - Host latency: 3.60222 ms (enqueue 1.43701 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98311 ms - Host latency: 3.62256 ms (enqueue 1.3583 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98306 ms - Host latency: 3.60178 ms (enqueue 1.34956 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98225 ms - Host latency: 3.61638 ms (enqueue 1.34988 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97942 ms - Host latency: 3.60867 ms (enqueue 1.34309 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98086 ms - Host latency: 3.61577 ms (enqueue 1.34182 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98215 ms - Host latency: 3.60823 ms (enqueue 1.34751 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98311 ms - Host latency: 3.60562 ms (enqueue 1.35125 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98088 ms - Host latency: 3.61096 ms (enqueue 1.3593 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98198 ms - Host latency: 3.60869 ms (enqueue 1.2707 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98159 ms - Host latency: 3.60981 ms (enqueue 1.34072 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98 ms - Host latency: 3.60945 ms (enqueue 1.3375 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02385 ms - Host latency: 3.64678 ms (enqueue 1.34099 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.11975 ms - Host latency: 3.74973 ms (enqueue 1.39148 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98125 ms - Host latency: 3.61636 ms (enqueue 1.37041 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98145 ms - Host latency: 3.61848 ms (enqueue 1.35737 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98228 ms - Host latency: 3.60535 ms (enqueue 1.34041 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98037 ms - Host latency: 3.61711 ms (enqueue 1.34119 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9802 ms - Host latency: 3.60479 ms (enqueue 1.34941 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.62153 ms (enqueue 1.34263 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.61616 ms (enqueue 1.34993 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98235 ms - Host latency: 3.61348 ms (enqueue 1.35571 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98098 ms - Host latency: 3.60767 ms (enqueue 1.33418 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9832 ms - Host latency: 3.61348 ms (enqueue 1.33994 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97966 ms - Host latency: 3.6061 ms (enqueue 1.34731 ms)
[03/25/2023-18:57:23] [I] 
[03/25/2023-18:57:23] [I] === Performance summary ===
[03/25/2023-18:57:23] [I] Throughput: 333.512 qps
[03/25/2023-18:57:23] [I] Latency: min = 3.55615 ms, max = 4.5824 ms, mean = 3.6227 ms, median = 3.61157 ms, percentile(90%) = 3.63989 ms, percentile(95%) = 3.65918 ms, percentile(99%) = 4.00616 ms
[03/25/2023-18:57:23] [I] Enqueue Time: min = 1.16992 ms, max = 1.65308 ms, mean = 1.34567 ms, median = 1.34741 ms, percentile(90%) = 1.37201 ms, percentile(95%) = 1.38245 ms, percentile(99%) = 1.44165 ms
[03/25/2023-18:57:23] [I] H2D Latency: min = 0.203857 ms, max = 0.229248 ms, mean = 0.209415 ms, median = 0.209229 ms, percentile(90%) = 0.212158 ms, percentile(95%) = 0.213135 ms, percentile(99%) = 0.215088 ms
[03/25/2023-18:57:23] [I] GPU Compute Time: min = 2.96533 ms, max = 3.95679 ms, mean = 2.99217 ms, median = 2.98169 ms, percentile(90%) = 2.99829 ms, percentile(95%) = 3.00543 ms, percentile(99%) = 3.38525 ms
[03/25/2023-18:57:23] [I] D2H Latency: min = 0.368896 ms, max = 0.645508 ms, mean = 0.421114 ms, median = 0.418671 ms, percentile(90%) = 0.439575 ms, percentile(95%) = 0.447998 ms, percentile(99%) = 0.525146 ms
[03/25/2023-18:57:23] [I] Total Host Walltime: 3.01039 s
[03/25/2023-18:57:23] [I] Total GPU Compute Time: 3.00414 s
[03/25/2023-18:57:23] [W] * GPU compute time is unstable, with coefficient of variance = 2.26049%.
[03/25/2023-18:57:23] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[03/25/2023-18:57:23] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/25/2023-18:57:23] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # ./trtexec --onnx=./yolov5s.onnx --saveEngine=y.trt

I have fixed the issue. forgive my stupidity.

ifstream file(trtFile, ios_base::in |ios_base::binary);
    assert(file.good());
    file.seekg(0, ios::end);
    auto size = file.tellg();
    cout << size << endl;
    char *engineString = new char[size];
    assert(engineString);
    file.seekg(0, file.beg); // just add this line
    file.read(engineString, size);
    file.close();
1 Like

I meet the same program with you ,could you tell me how to figure it out?