Profile inference time of each layer for .engine model to know where is bottleneck in Deepstream?

I want to know the inference time of each layer for .engine model in Deepstream to better understand where is bottleneck. Are there any tool to support that in Deepstream6.2? Thanks

1 Like

Hi @linhbkpro2010
since it’s .engine file, you should use nvinfer plugin which is based on TensorRT.

So, you can use TensorRT directly to profile it as command below:

$ /usr/src/tensorrt/bin/trtexec --loadEngine=swin_tiny_patch4_window7_224_bs8_best.engine --dumpProfile
…

I run command to check inference time of each layer, but I got error

&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine
[05/19/2023-02:35:26] [I] === Model Options ===
[05/19/2023-02:35:26] [I] Format: *
[05/19/2023-02:35:26] [I] Model: 
[05/19/2023-02:35:26] [I] Output:
[05/19/2023-02:35:26] [I] === Build Options ===
[05/19/2023-02:35:26] [I] Max batch: 1
[05/19/2023-02:35:26] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/19/2023-02:35:26] [I] minTiming: 1
[05/19/2023-02:35:26] [I] avgTiming: 8
[05/19/2023-02:35:26] [I] Precision: FP32
[05/19/2023-02:35:26] [I] LayerPrecisions: 
[05/19/2023-02:35:26] [I] Calibration: 
[05/19/2023-02:35:26] [I] Refit: Disabled
[05/19/2023-02:35:26] [I] Sparsity: Disabled
[05/19/2023-02:35:26] [I] Safe mode: Disabled
[05/19/2023-02:35:26] [I] DirectIO mode: Disabled
[05/19/2023-02:35:26] [I] Restricted mode: Disabled
[05/19/2023-02:35:26] [I] Build only: Disabled
[05/19/2023-02:35:26] [I] Save engine: 
[05/19/2023-02:35:26] [I] Load engine: model_b1_gpu0_fp32.engine
[05/19/2023-02:35:26] [I] Profiling verbosity: 0
[05/19/2023-02:35:26] [I] Tactic sources: Using default tactic sources
[05/19/2023-02:35:26] [I] timingCacheMode: local
[05/19/2023-02:35:26] [I] timingCacheFile: 
[05/19/2023-02:35:26] [I] Heuristic: Disabled
[05/19/2023-02:35:26] [I] Preview Features: Use default preview flags.
[05/19/2023-02:35:26] [I] Input(s)s format: fp32:CHW
[05/19/2023-02:35:26] [I] Output(s)s format: fp32:CHW
[05/19/2023-02:35:26] [I] Input build shapes: model
[05/19/2023-02:35:26] [I] Input calibration shapes: model
[05/19/2023-02:35:26] [I] === System Options ===
[05/19/2023-02:35:26] [I] Device: 0
[05/19/2023-02:35:26] [I] DLACore: 
[05/19/2023-02:35:26] [I] Plugins:
[05/19/2023-02:35:26] [I] === Inference Options ===
[05/19/2023-02:35:26] [I] Batch: 1
[05/19/2023-02:35:26] [I] Input inference shapes: model
[05/19/2023-02:35:26] [I] Iterations: 10
[05/19/2023-02:35:26] [I] Duration: 3s (+ 200ms warm up)
[05/19/2023-02:35:26] [I] Sleep time: 0ms
[05/19/2023-02:35:26] [I] Idle time: 0ms
[05/19/2023-02:35:26] [I] Streams: 1
[05/19/2023-02:35:26] [I] ExposeDMA: Disabled
[05/19/2023-02:35:26] [I] Data transfers: Enabled
[05/19/2023-02:35:26] [I] Spin-wait: Disabled
[05/19/2023-02:35:26] [I] Multithreading: Disabled
[05/19/2023-02:35:26] [I] CUDA Graph: Disabled
[05/19/2023-02:35:26] [I] Separate profiling: Disabled
[05/19/2023-02:35:26] [I] Time Deserialize: Disabled
[05/19/2023-02:35:26] [I] Time Refit: Disabled
[05/19/2023-02:35:26] [I] NVTX verbosity: 0
[05/19/2023-02:35:26] [I] Persistent Cache Ratio: 0
[05/19/2023-02:35:26] [I] Inputs:
[05/19/2023-02:35:26] [I] === Reporting Options ===
[05/19/2023-02:35:26] [I] Verbose: Disabled
[05/19/2023-02:35:26] [I] Averages: 10 inferences
[05/19/2023-02:35:26] [I] Percentiles: 90,95,99
[05/19/2023-02:35:26] [I] Dump refittable layers:Disabled
[05/19/2023-02:35:26] [I] Dump output: Disabled
[05/19/2023-02:35:26] [I] Profile: Disabled
[05/19/2023-02:35:26] [I] Export timing to JSON file: 
[05/19/2023-02:35:26] [I] Export output to JSON file: 
[05/19/2023-02:35:26] [I] Export profile to JSON file: 
[05/19/2023-02:35:26] [I] 
[05/19/2023-02:35:26] [I] === Device Information ===
[05/19/2023-02:35:26] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[05/19/2023-02:35:26] [I] Compute Capability: 7.5
[05/19/2023-02:35:26] [I] SMs: 68
[05/19/2023-02:35:26] [I] Compute Clock Rate: 1.545 GHz
[05/19/2023-02:35:26] [I] Device Global Memory: 11016 MiB
[05/19/2023-02:35:26] [I] Shared Memory per SM: 64 KiB
[05/19/2023-02:35:26] [I] Memory Bus Width: 352 bits (ECC disabled)
[05/19/2023-02:35:26] [I] Memory Clock Rate: 7 GHz
[05/19/2023-02:35:26] [I] 
[05/19/2023-02:35:26] [I] TensorRT version: 8.5.2
[05/19/2023-02:35:26] [I] Engine loaded in 0.211239 sec.
[05/19/2023-02:35:26] [I] [TRT] Loaded engine size: 175 MiB
[05/19/2023-02:35:27] [W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[05/19/2023-02:35:27] [E] Error[1]: [pluginV2Runner.cpp::load::300] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[05/19/2023-02:35:27] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
[05/19/2023-02:35:27] [E] Engine deserialization failed
[05/19/2023-02:35:27] [E] Got invalid engine!
[05/19/2023-02:35:27] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine

My .engine model is converted as repo GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO models . I use this command to convert .pt model to .engine

deepstream-app -c deepstream_app_config_orig.txt

I’m unfamiliar with C++. Please give a an advice to profile inference of each layer. Thanks.

are you running the tetexec with the engine on the different type of GPU ?

Yes, may be I use different GPU when running

deepstream-app -c deepstream_app_config_orig.txt

and

/usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine

But I run 2 command again and set CUDA_VISIBLE_DEVICES=0 for both command, but I stil got error almost same as above

root@a8db07bc1951:/opt/nvidia/deepstream/deepstream-6.2/sources/DeepStream-Yolo# CUDA_VISIBLE_DEVICES=0 /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine
[05/19/2023-02:54:23] [I] === Model Options ===
[05/19/2023-02:54:23] [I] Format: *
[05/19/2023-02:54:23] [I] Model: 
[05/19/2023-02:54:23] [I] Output:
[05/19/2023-02:54:23] [I] === Build Options ===
[05/19/2023-02:54:23] [I] Max batch: 1
[05/19/2023-02:54:23] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[05/19/2023-02:54:23] [I] minTiming: 1
[05/19/2023-02:54:23] [I] avgTiming: 8
[05/19/2023-02:54:23] [I] Precision: FP32
[05/19/2023-02:54:23] [I] LayerPrecisions: 
[05/19/2023-02:54:23] [I] Calibration: 
[05/19/2023-02:54:23] [I] Refit: Disabled
[05/19/2023-02:54:23] [I] Sparsity: Disabled
[05/19/2023-02:54:23] [I] Safe mode: Disabled
[05/19/2023-02:54:23] [I] DirectIO mode: Disabled
[05/19/2023-02:54:23] [I] Restricted mode: Disabled
[05/19/2023-02:54:23] [I] Build only: Disabled
[05/19/2023-02:54:23] [I] Save engine: 
[05/19/2023-02:54:23] [I] Load engine: model_b1_gpu0_fp32.engine
[05/19/2023-02:54:23] [I] Profiling verbosity: 0
[05/19/2023-02:54:23] [I] Tactic sources: Using default tactic sources
[05/19/2023-02:54:23] [I] timingCacheMode: local
[05/19/2023-02:54:23] [I] timingCacheFile: 
[05/19/2023-02:54:23] [I] Heuristic: Disabled
[05/19/2023-02:54:23] [I] Preview Features: Use default preview flags.
[05/19/2023-02:54:23] [I] Input(s)s format: fp32:CHW
[05/19/2023-02:54:23] [I] Output(s)s format: fp32:CHW
[05/19/2023-02:54:23] [I] Input build shapes: model
[05/19/2023-02:54:23] [I] Input calibration shapes: model
[05/19/2023-02:54:23] [I] === System Options ===
[05/19/2023-02:54:23] [I] Device: 0
[05/19/2023-02:54:23] [I] DLACore: 
[05/19/2023-02:54:23] [I] Plugins:
[05/19/2023-02:54:23] [I] === Inference Options ===
[05/19/2023-02:54:23] [I] Batch: 1
[05/19/2023-02:54:23] [I] Input inference shapes: model
[05/19/2023-02:54:23] [I] Iterations: 10
[05/19/2023-02:54:23] [I] Duration: 3s (+ 200ms warm up)
[05/19/2023-02:54:23] [I] Sleep time: 0ms
[05/19/2023-02:54:23] [I] Idle time: 0ms
[05/19/2023-02:54:23] [I] Streams: 1
[05/19/2023-02:54:23] [I] ExposeDMA: Disabled
[05/19/2023-02:54:23] [I] Data transfers: Enabled
[05/19/2023-02:54:23] [I] Spin-wait: Disabled
[05/19/2023-02:54:23] [I] Multithreading: Disabled
[05/19/2023-02:54:23] [I] CUDA Graph: Disabled
[05/19/2023-02:54:23] [I] Separate profiling: Disabled
[05/19/2023-02:54:23] [I] Time Deserialize: Disabled
[05/19/2023-02:54:23] [I] Time Refit: Disabled
[05/19/2023-02:54:23] [I] NVTX verbosity: 0
[05/19/2023-02:54:23] [I] Persistent Cache Ratio: 0
[05/19/2023-02:54:23] [I] Inputs:
[05/19/2023-02:54:23] [I] === Reporting Options ===
[05/19/2023-02:54:23] [I] Verbose: Disabled
[05/19/2023-02:54:23] [I] Averages: 10 inferences
[05/19/2023-02:54:23] [I] Percentiles: 90,95,99
[05/19/2023-02:54:23] [I] Dump refittable layers:Disabled
[05/19/2023-02:54:23] [I] Dump output: Disabled
[05/19/2023-02:54:23] [I] Profile: Disabled
[05/19/2023-02:54:23] [I] Export timing to JSON file: 
[05/19/2023-02:54:23] [I] Export output to JSON file: 
[05/19/2023-02:54:23] [I] Export profile to JSON file: 
[05/19/2023-02:54:23] [I] 
[05/19/2023-02:54:23] [I] === Device Information ===
[05/19/2023-02:54:23] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[05/19/2023-02:54:23] [I] Compute Capability: 7.5
[05/19/2023-02:54:23] [I] SMs: 68
[05/19/2023-02:54:23] [I] Compute Clock Rate: 1.545 GHz
[05/19/2023-02:54:23] [I] Device Global Memory: 11019 MiB
[05/19/2023-02:54:23] [I] Shared Memory per SM: 64 KiB
[05/19/2023-02:54:23] [I] Memory Bus Width: 352 bits (ECC disabled)
[05/19/2023-02:54:23] [I] Memory Clock Rate: 7 GHz
[05/19/2023-02:54:23] [I] 
[05/19/2023-02:54:23] [I] TensorRT version: 8.5.2
[05/19/2023-02:54:23] [I] Engine loaded in 0.208848 sec.
[05/19/2023-02:54:24] [I] [TRT] Loaded engine size: 175 MiB
[05/19/2023-02:54:24] [E] Error[1]: [pluginV2Runner.cpp::load::300] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[05/19/2023-02:54:24] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
[05/19/2023-02:54:24] [E] Engine deserialization failed
[05/19/2023-02:54:24] [E] Got invalid engine!
[05/19/2023-02:54:24] [E] Inference set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --loadEngine=model_b1_gpu0_fp32.engine

I dont know where is the problem.

does your model need TRT plugin?
If it needs, you need to specify with “–plugins=$TRT_PLUG-IN_LIB” to load the plugin lib

Thanks a lot. What quick awesome response. You are right. Now I can see expected info.

1 Like

Is there any other tools better for visualization and analysis of profiling of each layer?

As the image in comment#6 above, why do you think the output is not friendly enough?

1 Like

I have generated .engine model (by using tensorrt in python, I didn’t use trtexec). With trtexec and the aboce command I can get profile for each layer.

I found that Nvidia has tool Tensorrt engine explorer with more features. But this tool needs 3 json file: profile.jsob, graph.json and meta_profile.json. But now I already have .engine model, I can get the first 2 json files by using trtexec --exportProfile .... How I can get the third json file meta_profile.json?

@mchi I know that there are many ways to convert .onnx to .engine model.

Hi @johnminho ,
You can refer to yolo_deepstream/Guidance_of_QAT_performance_optimization.md at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub to use Tensorrt engine explorer with json file

1 Like

Thanks. The reference that you sent only draws graph of engine model. I want to deeply analyze profiling of each layer (inference), Tensort engine Explorer can give me more insight.

I want to confirm that is there any way to create file meta_profile.json from generated .engine?

As my undetstanding, I need to use Tensor engine explorer code to generate engine model and 3 json files, so that I can use Tensor engine explorer tools to analyze model. Is it right?

@mchi

I use this example from NVIDIA yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub to generate fp32 model. But when I export graph.json file, it only contains layer name. How to set ProfilingVerbosity::DETAILED in Deepstream to get full graph.json for later use in Tensorrt engine explorer? I search for in /opt/nvidia/deepstream/deepstream-6.2/sources/apps/sample_apps/deepstream-app/ but I didn’t find where I need to set. Sorry for unfamiliar with C++.
I also tried changing

    void setProfilingVerbosity(ProfilingVerbosity verbosity) noexcept
    {
        // mImpl->setProfilingVerbosity(verbosity);
        mImpl->setProfilingVerbosity(ProfilingVerbosity::DETAILED);

    }

But it is affected., what is wrong?
Is there any way to set it in deepstream_config.txt? Thanks.

Hi, @johnminho

One solution to do this is use trtexec to save Engine, export the layer info to file. And deepstream load engine directly.
sample command for trtexec:

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov7.onnx --fp16 --int8 --verbose --saveEngine=yolov7_ptq.engine --workspace=1024000 --warmUp=500 --duration=10 --useCudaGraph --useSpinWait --noDataTransfers --exportLayerInfo=yolov7_ptq_layer.json --profilingVerbosity=detailed --exportProfile=yolov7_ptq_profile.json

ref: https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/doc/Guidance_of_QAT_performance_optimization.md
and the visulization script is here:
https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/main/yolov7_qat/scripts/draw-engine.py

@haowang
Thanks. This is one solution.
I want to confirm that what API deepstream use to generate .onnx model? Does it use trtexec?
Could you please point me where is the source code to convert .onnx model to .engine in Deepstream? I means path to source code.

It’s the same as what trtexec does, you can check nvinfer source code in /opt/nvidia/deepstream/deepstream-6.2/sources/libs/nvdsinfer

@mchi Thank you so much.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.