deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match

cqzw555 · March 20, 2023, 12:11pm

Description

When I try to deserialize trt engine file, I get the follow error message.

ERROR: 1: [stdArchiveReader.cpp::StdArchiveReader::32] Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match)
ERROR: 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)

the trt engine file is generated from onnx using trtexec.

the onnx model file is generated from yolov5s.pt using the file export.py in the yolov5

I also tried to generate using the tensorrt’s api.It does not work. i got same error.

IBuilder *builder = createInferBuilder(logger);
    INetworkDefinition *network = builder->createNetworkV2(1U << int(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
    IBuilderConfig *config = builder->createBuilderConfig();
    IRuntime *runtime = createInferRuntime(logger);
    ICudaEngine *engine = nullptr;
    IParser *parser = createParser(*network, logger);
    assert(parser->parseFromFile(onnxFile.c_str(), int(logger.reportableSeverity)));

    cout << "Succeeded parsing from onnx file \'" << onnxFile << "\'" << endl;

    IHostMemory *engineString = builder->buildSerializedNetwork(*network, *config);
    assert(engineString != nullptr && engineString->size() != 0);
    cout << "Succeeded building serialized engine!" << endl;

    ofstream engineFile(trtFile, ios_base::binary | ios_base::out);
    assert(!engineFile.fail());
    engineFile.write(static_cast<char *>(engineString->data()), engineString->size());

    engineFile.close();

the souce code I tried to deserialize trt engine file:

ifstream file(trtFile, ios_base::in |ios_base::binary);
    assert(file.good());
    file.seekg(0, ios::end);
    auto size = file.tellg();
    char *engineString = new char[size];
    assert(engineString);
    file.read(engineString, size);
    file.close();
    // initLibNvInferPlugins(this->logger, "");
    this->runtime = createInferRuntime(*this->logger);
    assert(this->runtime);

    this->engine = this->runtime->deserializeCudaEngine((void*)engineString, size);
    assert(this->engine);

Environment

cuda cudnn and tensorrt I use come from developer.download.nvidia.cn
I have check all the version of meta packages required by tensorrt, cuda, and cudnn.
TensorRT Version: 8.5.3.1-1+cuda11.8
GPU: 3060ti
Nvidia Driver Version: nvidia-driver-530 530.30.02-0ubuntu1 amd64
CUDA Version: cuda-drivers-530 530.30.02-1 amd64 cuda-driver-dev-11-8 11.8.89-1 amd64
CUDNN Version: libcudnn8 8.8.1.3-1+cuda11.8 amd64
Operating System + Version: ubuntu18 lts (up to date)
Python Version: 3.7
PyTorch Version: 1.13.1

Relevant Files

the onnx file I generated :yolov5s.onnx (14.0 MB)
the trt engine generated from trtexec:yolov5s_trtexec.trt (31.9 MB)
the trt engine file geberate from tensorrt’s api:yolov5s.trt (32.1 MB)

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · March 20, 2023, 12:37pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

cqzw555 · March 25, 2023, 11:05am

The onnx model I have been shared in previous message.

I tried check_model.py,but there are not any outputs.
there is the output of trtexec :

&&&& RUNNING TensorRT.trtexec [TensorRT v8503] # ./trtexec --onnx=./yolov5s.onnx --saveEngine=y.trt
[03/25/2023-18:55:58] [I] === Model Options ===
[03/25/2023-18:55:58] [I] Format: ONNX
[03/25/2023-18:55:58] [I] Model: ./yolov5s.onnx
[03/25/2023-18:55:58] [I] Output:
[03/25/2023-18:55:58] [I] === Build Options ===
[03/25/2023-18:55:58] [I] Max batch: explicit batch
[03/25/2023-18:55:58] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[03/25/2023-18:55:58] [I] minTiming: 1
[03/25/2023-18:55:58] [I] avgTiming: 8
[03/25/2023-18:55:58] [I] Precision: FP32
[03/25/2023-18:55:58] [I] LayerPrecisions: 
[03/25/2023-18:55:58] [I] Calibration: 
[03/25/2023-18:55:58] [I] Refit: Disabled
[03/25/2023-18:55:58] [I] Sparsity: Disabled
[03/25/2023-18:55:58] [I] Safe mode: Disabled
[03/25/2023-18:55:58] [I] DirectIO mode: Disabled
[03/25/2023-18:55:58] [I] Restricted mode: Disabled
[03/25/2023-18:55:58] [I] Build only: Disabled
[03/25/2023-18:55:58] [I] Save engine: y.trt
[03/25/2023-18:55:58] [I] Load engine: 
[03/25/2023-18:55:58] [I] Profiling verbosity: 0
[03/25/2023-18:55:58] [I] Tactic sources: Using default tactic sources
[03/25/2023-18:55:58] [I] timingCacheMode: local
[03/25/2023-18:55:58] [I] timingCacheFile: 
[03/25/2023-18:55:58] [I] Heuristic: Disabled
[03/25/2023-18:55:58] [I] Preview Features: Use default preview flags.
[03/25/2023-18:55:58] [I] Input(s)s format: fp32:CHW
[03/25/2023-18:55:58] [I] Output(s)s format: fp32:CHW
[03/25/2023-18:55:58] [I] Input build shapes: model
[03/25/2023-18:55:58] [I] Input calibration shapes: model
[03/25/2023-18:55:58] [I] === System Options ===
[03/25/2023-18:55:58] [I] Device: 0
[03/25/2023-18:55:58] [I] DLACore: 
[03/25/2023-18:55:58] [I] Plugins:
[03/25/2023-18:55:58] [I] === Inference Options ===
[03/25/2023-18:55:58] [I] Batch: Explicit
[03/25/2023-18:55:58] [I] Input inference shapes: model
[03/25/2023-18:55:58] [I] Iterations: 10
[03/25/2023-18:55:58] [I] Duration: 3s (+ 200ms warm up)
[03/25/2023-18:55:58] [I] Sleep time: 0ms
[03/25/2023-18:55:58] [I] Idle time: 0ms
[03/25/2023-18:55:58] [I] Streams: 1
[03/25/2023-18:55:58] [I] ExposeDMA: Disabled
[03/25/2023-18:55:58] [I] Data transfers: Enabled
[03/25/2023-18:55:58] [I] Spin-wait: Disabled
[03/25/2023-18:55:58] [I] Multithreading: Disabled
[03/25/2023-18:55:58] [I] CUDA Graph: Disabled
[03/25/2023-18:55:58] [I] Separate profiling: Disabled
[03/25/2023-18:55:58] [I] Time Deserialize: Disabled
[03/25/2023-18:55:58] [I] Time Refit: Disabled
[03/25/2023-18:55:58] [I] NVTX verbosity: 0
[03/25/2023-18:55:58] [I] Persistent Cache Ratio: 0
[03/25/2023-18:55:58] [I] Inputs:
[03/25/2023-18:55:58] [I] === Reporting Options ===
[03/25/2023-18:55:58] [I] Verbose: Disabled
[03/25/2023-18:55:58] [I] Averages: 10 inferences
[03/25/2023-18:55:58] [I] Percentiles: 90,95,99
[03/25/2023-18:55:58] [I] Dump refittable layers:Disabled
[03/25/2023-18:55:58] [I] Dump output: Disabled
[03/25/2023-18:55:58] [I] Profile: Disabled
[03/25/2023-18:55:58] [I] Export timing to JSON file: 
[03/25/2023-18:55:58] [I] Export output to JSON file: 
[03/25/2023-18:55:58] [I] Export profile to JSON file: 
[03/25/2023-18:55:58] [I] 
[03/25/2023-18:55:58] [I] === Device Information ===
[03/25/2023-18:55:58] [I] Selected Device: NVIDIA GeForce RTX 3060 Ti
[03/25/2023-18:55:58] [I] Compute Capability: 8.6
[03/25/2023-18:55:58] [I] SMs: 38
[03/25/2023-18:55:58] [I] Compute Clock Rate: 1.755 GHz
[03/25/2023-18:55:58] [I] Device Global Memory: 7965 MiB
[03/25/2023-18:55:58] [I] Shared Memory per SM: 100 KiB
[03/25/2023-18:55:58] [I] Memory Bus Width: 256 bits (ECC disabled)
[03/25/2023-18:55:58] [I] Memory Clock Rate: 7.001 GHz
[03/25/2023-18:55:58] [I] 
[03/25/2023-18:55:58] [I] TensorRT version: 8.5.3
[03/25/2023-18:55:58] [I] [TRT] [MemUsageChange] Init CUDA: CPU +571, GPU +0, now: CPU 584, GPU 669 (MiB)
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +542, GPU +116, now: CPU 1178, GPU 785 (MiB)
[03/25/2023-18:56:00] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/25/2023-18:56:00] [I] Start parsing network model
[03/25/2023-18:56:00] [I] [TRT] ----------------------------------------------------------------
[03/25/2023-18:56:00] [I] [TRT] Input filename:   ./yolov5s.onnx
[03/25/2023-18:56:00] [I] [TRT] ONNX IR version:  0.0.7
[03/25/2023-18:56:00] [I] [TRT] Opset version:    12
[03/25/2023-18:56:00] [I] [TRT] Producer name:    pytorch
[03/25/2023-18:56:00] [I] [TRT] Producer version: 1.12.1
[03/25/2023-18:56:00] [I] [TRT] Domain:           
[03/25/2023-18:56:00] [I] [TRT] Model version:    0
[03/25/2023-18:56:00] [I] [TRT] Doc string:       
[03/25/2023-18:56:00] [I] [TRT] ----------------------------------------------------------------
[03/25/2023-18:56:00] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/25/2023-18:56:00] [I] Finish parsing network model
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1287, GPU +362, now: CPU 2498, GPU 1147 (MiB)
[03/25/2023-18:56:00] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +246, GPU +58, now: CPU 2744, GPU 1205 (MiB)
[03/25/2023-18:56:00] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/25/2023-18:57:20] [I] [TRT] [GraphReduction] The approximate region cut reduction algorithm is called.
[03/25/2023-18:57:20] [I] [TRT] Total Activation Memory: 8605349888
[03/25/2023-18:57:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[03/25/2023-18:57:20] [I] [TRT] Total Host Persistent Memory: 84176
[03/25/2023-18:57:20] [I] [TRT] Total Device Persistent Memory: 231936
[03/25/2023-18:57:20] [I] [TRT] Total Scratch Memory: 134217728
[03/25/2023-18:57:20] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 7 MiB, GPU 4367 MiB
[03/25/2023-18:57:20] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 195 steps to complete.
[03/25/2023-18:57:20] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 18.1328ms to assign 15 blocks to 195 nodes requiring 156228096 bytes.
[03/25/2023-18:57:20] [I] [TRT] Total Activation Memory: 156228096
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3594, GPU 1545 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +3, GPU +31, now: CPU 3, GPU 31 (MiB)
[03/25/2023-18:57:20] [I] Engine built in 81.9838 sec.
[03/25/2023-18:57:20] [I] [TRT] Loaded engine size: 31 MiB
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 3067, GPU 1411 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +30, now: CPU 0, GPU 30 (MiB)
[03/25/2023-18:57:20] [I] Engine deserialized in 0.0126936 sec.
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 3067, GPU 1411 (MiB)
[03/25/2023-18:57:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +149, now: CPU 0, GPU 179 (MiB)
[03/25/2023-18:57:20] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[03/25/2023-18:57:20] [I] Setting persistentCacheLimit to 0 bytes.
[03/25/2023-18:57:20] [I] Using random values for input onnx::Cast_0
[03/25/2023-18:57:20] [I] Created input binding for onnx::Cast_0 with dimensions 1x3x640x640
[03/25/2023-18:57:20] [I] Using random values for output 630
[03/25/2023-18:57:20] [I] Created output binding for 630 with dimensions 1x25200x85
[03/25/2023-18:57:20] [I] Starting inference
[03/25/2023-18:57:23] [I] Warmup completed 66 queries over 200 ms
[03/25/2023-18:57:23] [I] Timing trace has 1004 queries over 3.01039 s
[03/25/2023-18:57:23] [I] 
[03/25/2023-18:57:23] [I] === Trace details ===
[03/25/2023-18:57:23] [I] Trace averages of 10 runs:
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00268 ms - Host latency: 3.64388 ms (enqueue 1.35774 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00237 ms - Host latency: 3.64576 ms (enqueue 1.36225 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00421 ms - Host latency: 3.63826 ms (enqueue 1.35391 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00594 ms - Host latency: 3.62881 ms (enqueue 1.33792 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.00575 ms - Host latency: 3.63828 ms (enqueue 1.36564 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98281 ms - Host latency: 3.6037 ms (enqueue 1.35683 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98025 ms - Host latency: 3.61531 ms (enqueue 1.34596 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.03319 ms - Host latency: 3.67264 ms (enqueue 1.34447 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.12074 ms - Host latency: 3.74781 ms (enqueue 1.39609 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98557 ms - Host latency: 3.62251 ms (enqueue 1.35918 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97882 ms - Host latency: 3.60628 ms (enqueue 1.35126 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98005 ms - Host latency: 3.60111 ms (enqueue 1.34577 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98351 ms - Host latency: 3.61732 ms (enqueue 1.34944 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9821 ms - Host latency: 3.61712 ms (enqueue 1.34304 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98332 ms - Host latency: 3.63346 ms (enqueue 1.35293 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98221 ms - Host latency: 3.61263 ms (enqueue 1.33514 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98353 ms - Host latency: 3.61492 ms (enqueue 1.33793 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98065 ms - Host latency: 3.61238 ms (enqueue 1.35081 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9823 ms - Host latency: 3.61409 ms (enqueue 1.34128 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98219 ms - Host latency: 3.60547 ms (enqueue 1.34345 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98014 ms - Host latency: 3.61153 ms (enqueue 1.34828 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98393 ms - Host latency: 3.62723 ms (enqueue 1.34292 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98301 ms - Host latency: 3.61342 ms (enqueue 1.34311 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98271 ms - Host latency: 3.60798 ms (enqueue 1.3453 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98066 ms - Host latency: 3.61592 ms (enqueue 1.35807 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9826 ms - Host latency: 3.59905 ms (enqueue 1.35049 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98168 ms - Host latency: 3.60786 ms (enqueue 1.34774 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02327 ms - Host latency: 3.65791 ms (enqueue 1.34291 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.13108 ms - Host latency: 3.76853 ms (enqueue 1.39824 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98169 ms - Host latency: 3.60955 ms (enqueue 1.34359 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98199 ms - Host latency: 3.63452 ms (enqueue 1.34982 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98077 ms - Host latency: 3.60051 ms (enqueue 1.34521 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98163 ms - Host latency: 3.60713 ms (enqueue 1.34738 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98363 ms - Host latency: 3.61511 ms (enqueue 1.34919 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98423 ms - Host latency: 3.6106 ms (enqueue 1.34349 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98262 ms - Host latency: 3.61565 ms (enqueue 1.33793 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98373 ms - Host latency: 3.61611 ms (enqueue 1.34597 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9823 ms - Host latency: 3.61178 ms (enqueue 1.35188 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97985 ms - Host latency: 3.5972 ms (enqueue 1.35369 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98279 ms - Host latency: 3.61038 ms (enqueue 1.34532 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98251 ms - Host latency: 3.60774 ms (enqueue 1.3387 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98199 ms - Host latency: 3.60171 ms (enqueue 1.32628 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98361 ms - Host latency: 3.62836 ms (enqueue 1.32795 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98353 ms - Host latency: 3.61538 ms (enqueue 1.33948 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97931 ms - Host latency: 3.61067 ms (enqueue 1.34053 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97964 ms - Host latency: 3.59666 ms (enqueue 1.34465 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98278 ms - Host latency: 3.61497 ms (enqueue 1.34336 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.0295 ms - Host latency: 3.65892 ms (enqueue 1.34763 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.09833 ms - Host latency: 3.72833 ms (enqueue 1.39119 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98231 ms - Host latency: 3.60315 ms (enqueue 1.35411 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98446 ms - Host latency: 3.61978 ms (enqueue 1.34379 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98309 ms - Host latency: 3.62393 ms (enqueue 1.34669 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98147 ms - Host latency: 3.61221 ms (enqueue 1.3424 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98341 ms - Host latency: 3.61759 ms (enqueue 1.34286 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98251 ms - Host latency: 3.61531 ms (enqueue 1.3425 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97858 ms - Host latency: 3.6001 ms (enqueue 1.35293 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98273 ms - Host latency: 3.6028 ms (enqueue 1.34374 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9835 ms - Host latency: 3.6109 ms (enqueue 1.34844 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98135 ms - Host latency: 3.61459 ms (enqueue 1.3537 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98126 ms - Host latency: 3.62198 ms (enqueue 1.34771 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98289 ms - Host latency: 3.61235 ms (enqueue 1.33959 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98165 ms - Host latency: 3.60614 ms (enqueue 1.34415 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98135 ms - Host latency: 3.6084 ms (enqueue 1.35081 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98179 ms - Host latency: 3.62878 ms (enqueue 1.31062 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9811 ms - Host latency: 3.60444 ms (enqueue 1.27759 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98188 ms - Host latency: 3.62107 ms (enqueue 1.28445 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98389 ms - Host latency: 3.61699 ms (enqueue 1.27905 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02979 ms - Host latency: 3.65498 ms (enqueue 1.27449 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.11414 ms - Host latency: 3.74292 ms (enqueue 1.29788 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.60635 ms (enqueue 1.36624 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98347 ms - Host latency: 3.60325 ms (enqueue 1.33906 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98401 ms - Host latency: 3.61831 ms (enqueue 1.34968 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98474 ms - Host latency: 3.61946 ms (enqueue 1.35396 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98296 ms - Host latency: 3.6145 ms (enqueue 1.34971 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98213 ms - Host latency: 3.60479 ms (enqueue 1.35981 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9842 ms - Host latency: 3.60222 ms (enqueue 1.43701 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98311 ms - Host latency: 3.62256 ms (enqueue 1.3583 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98306 ms - Host latency: 3.60178 ms (enqueue 1.34956 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98225 ms - Host latency: 3.61638 ms (enqueue 1.34988 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97942 ms - Host latency: 3.60867 ms (enqueue 1.34309 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98086 ms - Host latency: 3.61577 ms (enqueue 1.34182 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98215 ms - Host latency: 3.60823 ms (enqueue 1.34751 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98311 ms - Host latency: 3.60562 ms (enqueue 1.35125 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98088 ms - Host latency: 3.61096 ms (enqueue 1.3593 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98198 ms - Host latency: 3.60869 ms (enqueue 1.2707 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98159 ms - Host latency: 3.60981 ms (enqueue 1.34072 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98 ms - Host latency: 3.60945 ms (enqueue 1.3375 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.02385 ms - Host latency: 3.64678 ms (enqueue 1.34099 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 3.11975 ms - Host latency: 3.74973 ms (enqueue 1.39148 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98125 ms - Host latency: 3.61636 ms (enqueue 1.37041 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98145 ms - Host latency: 3.61848 ms (enqueue 1.35737 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98228 ms - Host latency: 3.60535 ms (enqueue 1.34041 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98037 ms - Host latency: 3.61711 ms (enqueue 1.34119 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9802 ms - Host latency: 3.60479 ms (enqueue 1.34941 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.62153 ms (enqueue 1.34263 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98218 ms - Host latency: 3.61616 ms (enqueue 1.34993 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98235 ms - Host latency: 3.61348 ms (enqueue 1.35571 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.98098 ms - Host latency: 3.60767 ms (enqueue 1.33418 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.9832 ms - Host latency: 3.61348 ms (enqueue 1.33994 ms)
[03/25/2023-18:57:23] [I] Average on 10 runs - GPU latency: 2.97966 ms - Host latency: 3.6061 ms (enqueue 1.34731 ms)
[03/25/2023-18:57:23] [I] 
[03/25/2023-18:57:23] [I] === Performance summary ===
[03/25/2023-18:57:23] [I] Throughput: 333.512 qps
[03/25/2023-18:57:23] [I] Latency: min = 3.55615 ms, max = 4.5824 ms, mean = 3.6227 ms, median = 3.61157 ms, percentile(90%) = 3.63989 ms, percentile(95%) = 3.65918 ms, percentile(99%) = 4.00616 ms
[03/25/2023-18:57:23] [I] Enqueue Time: min = 1.16992 ms, max = 1.65308 ms, mean = 1.34567 ms, median = 1.34741 ms, percentile(90%) = 1.37201 ms, percentile(95%) = 1.38245 ms, percentile(99%) = 1.44165 ms
[03/25/2023-18:57:23] [I] H2D Latency: min = 0.203857 ms, max = 0.229248 ms, mean = 0.209415 ms, median = 0.209229 ms, percentile(90%) = 0.212158 ms, percentile(95%) = 0.213135 ms, percentile(99%) = 0.215088 ms
[03/25/2023-18:57:23] [I] GPU Compute Time: min = 2.96533 ms, max = 3.95679 ms, mean = 2.99217 ms, median = 2.98169 ms, percentile(90%) = 2.99829 ms, percentile(95%) = 3.00543 ms, percentile(99%) = 3.38525 ms
[03/25/2023-18:57:23] [I] D2H Latency: min = 0.368896 ms, max = 0.645508 ms, mean = 0.421114 ms, median = 0.418671 ms, percentile(90%) = 0.439575 ms, percentile(95%) = 0.447998 ms, percentile(99%) = 0.525146 ms
[03/25/2023-18:57:23] [I] Total Host Walltime: 3.01039 s
[03/25/2023-18:57:23] [I] Total GPU Compute Time: 3.00414 s
[03/25/2023-18:57:23] [W] * GPU compute time is unstable, with coefficient of variance = 2.26049%.
[03/25/2023-18:57:23] [W]   If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.
[03/25/2023-18:57:23] [I] Explanations of the performance metrics are printed in the verbose logs.
[03/25/2023-18:57:23] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8503] # ./trtexec --onnx=./yolov5s.onnx --saveEngine=y.trt

cqzw555 · March 25, 2023, 11:45am

I have fixed the issue. forgive my stupidity.

ifstream file(trtFile, ios_base::in |ios_base::binary);
    assert(file.good());
    file.seekg(0, ios::end);
    auto size = file.tellg();
    cout << size << endl;
    char *engineString = new char[size];
    assert(engineString);
    file.seekg(0, file.beg); // just add this line
    file.read(engineString, size);
    file.close();

3024196179 · April 22, 2024, 3:30am

I meet the same program with you ,could you tell me how to figure it out?

Topic		Replies	Views
Tensorrt Inference Segmentation fault TensorRT tensorrt , cudnn	6	293	June 5, 2024
Azure CustomVision ONNX model stopped working in DS 6.2 DeepStream SDK	10	479	October 23, 2023
TensorRT inference process TensorRT	4	617	May 17, 2021
ConvTranspose + Add Slow TensorRT tensorrt	4	644	July 25, 2023
Error loading .trt model Jetson AGX Orin tensorrt	7	75	November 6, 2024
Trt with batch TensorRT	4	629	July 27, 2022
Cannot serialize ONNX model on TensorRT 8 TensorRT	3	1438	May 26, 2021
YOLOv5 QAT model inference empty && pytorch-quantization-toolkit TensorRT	4	2062	December 7, 2021
Performance regression when using CUDA Graph with MPS enabled TensorRT tensorrt , cuda	3	1033	December 31, 2023
Tensorrt inference with batch > 1 TensorRT	4	1380	October 13, 2022

deserializeCudaEngine failed. Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match

Description

Environment

Relevant Files

check_model.py

Related topics