I generated .engine model from .onnx by this tool TensorRT-For-YOLO-Series/export.py at main · Linaom1214/TensorRT-For-YOLO-Series · GitHub. This repo uses tensorrt, not trtexec to generate .engine model. From the generated .engine, I can generate 2 json files: profile.json
and graph.json
by command:
/usr/src/tensorrt/bin/trtexec --loadEngine=model.engine --exportProfile=profile.json --exportLayerInfo=graph.json
This is an output of the above command:
ynamic_batch_INT8.engine --exportProfile=profile.json --exportLayerInfo=graph.json
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov7_NOT_dynamic_batch_INT8.engine --exportProfile=profile.json --exportLayerInfo=graph.json
[05/30/2023-10:20:41] [I] === Model Options ===
[05/30/2023-10:20:41] [I] Format: *
[05/30/2023-10:20:41] [I] Model:
[05/30/2023-10:20:41] [I] Output:
[05/30/2023-10:20:41] [I] === Build Options ===
[05/30/2023-10:20:41] [I] Max batch: 1
[05/30/2023-10:20:41] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/30/2023-10:20:41] [I] minTiming: 1
[05/30/2023-10:20:41] [I] avgTiming: 8
[05/30/2023-10:20:41] [I] Precision: FP32
[05/30/2023-10:20:41] [I] LayerPrecisions:
[05/30/2023-10:20:41] [I] Calibration:
[05/30/2023-10:20:41] [I] Refit: Disabled
[05/30/2023-10:20:41] [I] Sparsity: Disabled
[05/30/2023-10:20:41] [I] Safe mode: Disabled
[05/30/2023-10:20:41] [I] DirectIO mode: Disabled
[05/30/2023-10:20:41] [I] Restricted mode: Disabled
[05/30/2023-10:20:41] [I] Build only: Disabled
[05/30/2023-10:20:41] [I] Save engine:
[05/30/2023-10:20:41] [I] Load engine: yolov7_NOT_dynamic_batch_INT8.engine
[05/30/2023-10:20:41] [I] Profiling verbosity: 0
[05/30/2023-10:20:41] [I] Tactic sources: Using default tactic sources
[05/30/2023-10:20:41] [I] timingCacheMode: local
[05/30/2023-10:20:41] [I] timingCacheFile:
[05/30/2023-10:20:41] [I] Heuristic: Disabled
[05/30/2023-10:20:41] [I] Preview Features: Use default preview flags.
[05/30/2023-10:20:41] [I] Input(s)s format: fp32:CHW
[05/30/2023-10:20:41] [I] Output(s)s format: fp32:CHW
[05/30/2023-10:20:41] [I] Input build shapes: model
[05/30/2023-10:20:41] [I] Input calibration shapes: model
[05/30/2023-10:20:41] [I] === System Options ===
[05/30/2023-10:20:41] [I] Device: 0
[05/30/2023-10:20:41] [I] DLACore:
[05/30/2023-10:20:41] [I] Plugins:
[05/30/2023-10:20:41] [I] === Inference Options ===
[05/30/2023-10:20:41] [I] Batch: 1
[05/30/2023-10:20:41] [I] Input inference shapes: model
[05/30/2023-10:20:41] [I] Iterations: 10
[05/30/2023-10:20:41] [I] Duration: 3s (+ 200ms warm up)
[05/30/2023-10:20:41] [I] Sleep time: 0ms
[05/30/2023-10:20:41] [I] Idle time: 0ms
[05/30/2023-10:20:41] [I] Streams: 1
[05/30/2023-10:20:41] [I] ExposeDMA: Disabled
[05/30/2023-10:20:41] [I] Data transfers: Enabled
[05/30/2023-10:20:41] [I] Spin-wait: Disabled
[05/30/2023-10:20:41] [I] Multithreading: Disabled
[05/30/2023-10:20:41] [I] CUDA Graph: Disabled
[05/30/2023-10:20:41] [I] Separate profiling: Disabled
[05/30/2023-10:20:41] [I] Time Deserialize: Disabled
[05/30/2023-10:20:41] [I] Time Refit: Disabled
[05/30/2023-10:20:41] [I] NVTX verbosity: 0
[05/30/2023-10:20:41] [I] Persistent Cache Ratio: 0
[05/30/2023-10:20:41] [I] Inputs:
[05/30/2023-10:20:41] [I] === Reporting Options ===
[05/30/2023-10:20:41] [I] Verbose: Disabled
[05/30/2023-10:20:41] [I] Averages: 10 inferences
[05/30/2023-10:20:41] [I] Percentiles: 90,95,99
[05/30/2023-10:20:41] [I] Dump refittable layers:Disabled
[05/30/2023-10:20:41] [I] Dump output: Disabled
[05/30/2023-10:20:41] [I] Profile: Disabled
[05/30/2023-10:20:41] [I] Export timing to JSON file:
[05/30/2023-10:20:41] [I] Export output to JSON file:
[05/30/2023-10:20:41] [I] Export profile to JSON file: profile.json
[05/30/2023-10:20:41] [I]
[05/30/2023-10:20:41] [I] === Device Information ===
[05/30/2023-10:20:41] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[05/30/2023-10:20:41] [I] Compute Capability: 7.5
[05/30/2023-10:20:41] [I] SMs: 68
[05/30/2023-10:20:41] [I] Compute Clock Rate: 1.545 GHz
[05/30/2023-10:20:41] [I] Device Global Memory: 11019 MiB
[05/30/2023-10:20:41] [I] Shared Memory per SM: 64 KiB
[05/30/2023-10:20:41] [I] Memory Bus Width: 352 bits (ECC disabled)
[05/30/2023-10:20:41] [I] Memory Clock Rate: 7 GHz
[05/30/2023-10:20:41] [I]
[05/30/2023-10:20:41] [I] TensorRT version: 8.5.2
[05/30/2023-10:20:41] [I] Engine loaded in 0.0537632 sec.
[05/30/2023-10:20:41] [I] [TRT] Loaded engine size: 38 MiB
[05/30/2023-10:20:42] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[05/30/2023-10:20:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +41, now: CPU 0, GPU 41 (MiB)
[05/30/2023-10:20:42] [I] Engine deserialized in 0.504064 sec.
[05/30/2023-10:20:42] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +28, now: CPU 0, GPU 69 (MiB)
[05/30/2023-10:20:42] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
[05/30/2023-10:20:42] [I] Setting persistentCacheLimit to 0 bytes.
[05/30/2023-10:20:42] [I] Using random values for input images
[05/30/2023-10:20:42] [I] Created input binding for images with dimensions 1x3x640x640
[05/30/2023-10:20:42] [I] Using random values for output output
[05/30/2023-10:20:42] [I] Created output binding for output with dimensions 1x25200x85
[05/30/2023-10:20:42] [I] [TRT] The profiling verbosity was set to ProfilingVerbosity::kLAYER_NAMES_ONLY when the engine was built, so only the layer names will be returned. Rebuild the engine with ProfilingVerbosity::kDETAILED to get more verbose layer information.
[05/30/2023-10:20:42] [I] Starting inference
[05/30/2023-10:20:45] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[05/30/2023-10:20:45] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
&&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov7_NOT_dynamic_batch_INT8.engine --exportProfile=profile.json --exportLayerInfo=graph.json
How I can get file profile.metadata.json
from generated .engine model?
I saw that the repo TensorRT/tutorial.ipynb at main · NVIDIA/TensorRT · GitHub needs 3 json file. If we generate .engine model not by using trtexec
, how we can get full json files for Tensorrt engine explorer?
Thanks