I use this repo for deploy yolov7 model and it based on Deepstream 6.2. I use Jetson xavier nx.
I know that deepstream is better for videos and streams data, but before deploy i need to know accuracy of model .engine (after converting) on my dataset (list of images). I am unfamiliar with C++. What should i do?
There are TensorRT python APIs. TensorRT/python at master · NVIDIA/TensorRT (github.com)
We have a sample of python based accuracy measurement. yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream (github.com)
I also try to get evaluate .engine yolov7 model, I have a custom plugin nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
. Here is my code
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
trt.init_libnvinfer_plugins(None, "nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so")
#trt.init_libnvinfer_plugins(None, "")
# Load the TensorRT engine
TRT_ENGINE_PATH = 'model_b1_gpu0_fp32.engine'
with open(TRT_ENGINE_PATH, 'rb') as f:
engine_data = f.read()
runtime = trt.Runtime(trt.Logger())
engine = runtime.deserialize_cuda_engine(engine_data)
# Create an execution context on the GPU
context = engine.create_execution_context()
# Allocate memory for input and output buffers
input_size = trt.volume(engine.get_binding_shape(0)) * engine.max_batch_size
output_size = trt.volume(engine.get_binding_shape(1)) * engine.max_batch_size
input_buffer = cuda.mem_alloc(input_size * np.dtype(np.float32).itemsize)
output_buffer = cuda.mem_alloc(output_size * np.dtype(np.float32).itemsize)
# Create a stream for asynchronous execution
stream = cuda.Stream()
# Prepare input data
# Replace this with your own input preprocessing logic
input_data = np.random.randn(*engine.get_binding_shape(0)).astype(np.float32)
np.copyto(cuda.mem_alloc(input_buffer), input_data.ravel())
# Execute the inference
context.execute_async_v2(
bindings=[int(input_buffer), int(output_buffer)],
stream_handle=stream.handle)
# Synchronize the stream
stream.synchronize()
# Retrieve the output results
output_data = np.empty(engine.get_binding_shape(1), dtype=np.float32)
cuda.memcpy_dtoh(output_data, output_buffer)
# Post-process the output data
# Replace this with your own output processing logic
print(output_data)
But I got this error
[05/19/2023-05:48:12] [TRT] [E] 1: [pluginV2Runner.cpp::load::300] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[05/19/2023-05:48:12] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
Traceback (most recent call last):
File "eval.py", line 16, in <module>
context = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'
I refer source code here tensorflow - Inference with TensorRT .engine file on python - Stack Overflow to evaluated my yolov7 .engine model. Could you help me please?
I’m confused. tensorflow - Inference with TensorRT .engine file on python - Stack Overflow describes how to generate model engine from a tlt model, but you have told us you are using the models from marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO models (github.com) whose YoloV7 model is a ONNX model.
TensorRT can generate model engine from ONNX model with “trtexec” tool. The tool can be found in /usr/src/tensorrt/bin/ folder in your Jetson board if your TensorRT packages has been installed correctly. Please get familiar with TensorRT before you try to use it. NVIDIA Deep Learning TensorRT Documentation
The “nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so” in marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO models (github.com) is not a TensorRT nvinfer plugin but a general library including several nvinfer plugins and other functions. You can not use “trt.init_libnvinfer_plugins()” with it. As to the YoloV7 model in this repo, it is a ONNX model which needs no extra nvinfer plugin. “trtexec” tool is enough.
Sorry for inconvinence.
I will clear my flow. I use repo GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO models to convert from yolov7.pt model
=> yolov7.wts, yolov7.cfg
=> model_b1_gpu0_fp32.engine
tensorrt model. I have a custom plugin generated from above process libnvdsinfer_custom_impl_Yolo.so
.
And now I want to evaluate my .engine
model with Python API. Here is my code
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
trt.init_libnvinfer_plugins(None, "nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so")
#trt.init_libnvinfer_plugins(None, "")
# Load the TensorRT engine
TRT_ENGINE_PATH = 'model_b1_gpu0_fp32.engine'
with open(TRT_ENGINE_PATH, 'rb') as f:
engine_data = f.read()
runtime = trt.Runtime(trt.Logger())
engine = runtime.deserialize_cuda_engine(engine_data)
# Create an execution context on the GPU
context = engine.create_execution_context()
# Allocate memory for input and output buffers
input_size = trt.volume(engine.get_binding_shape(0)) * engine.max_batch_size
output_size = trt.volume(engine.get_binding_shape(1)) * engine.max_batch_size
input_buffer = cuda.mem_alloc(input_size * np.dtype(np.float32).itemsize)
output_buffer = cuda.mem_alloc(output_size * np.dtype(np.float32).itemsize)
# Create a stream for asynchronous execution
stream = cuda.Stream()
# Prepare input data
# Replace this with your own input preprocessing logic
input_data = np.random.randn(*engine.get_binding_shape(0)).astype(np.float32)
np.copyto(cuda.mem_alloc(input_buffer), input_data.ravel())
# Execute the inference
context.execute_async_v2(
bindings=[int(input_buffer), int(output_buffer)],
stream_handle=stream.handle)
# Synchronize the stream
stream.synchronize()
# Retrieve the output results
output_data = np.empty(engine.get_binding_shape(1), dtype=np.float32)
cuda.memcpy_dtoh(output_data, output_buffer)
# Post-process the output data
# Replace this with your own output processing logic
print(output_data)
But I got this error
[05/19/2023-05:48:12] [TRT] [E] 1: [pluginV2Runner.cpp::load::300] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)
[05/19/2023-05:48:12] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::66] Error Code 4: Internal Error (Engine deserialization failed.)
Traceback (most recent call last):
File "eval.py", line 16, in <module>
context = engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'
Thanks for hearning me.
The DeepStream-Yolo/YOLOv7.md at master · marcoslucianops/DeepStream-Yolo · GitHub has told you how to generate YoloV7 ONNX.
If you want to use wts model, you need to implement the nvinfer model parsing plugin by yourself, the model parsing in DeepStream-Yolo/yoloPlugins.cpp at master · marcoslucianops/DeepStream-Yolo · GitHub is for YoloV3 model. It is no use for you.
Can you consult the author of marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO models (github.com) for how to use his samples? The sample is not provided by Nvidia.
The DeepStream-Yolo/YOLOv7.md at master · marcoslucianops/DeepStream-Yolo · GitHub not use YOLOv7 ONNX as intermediate. It use flow yolov7.pt model => yolov7.wts, yolov7.cfg => model_b1_gpu0_fp32.engine
to generate .engine model. An author of this repo don’t share evaluation part of the .engine model and I can not know converted model .engine is good or bad.
So I need your help. I’m a newbie with Tensorrt.
This repo is very useful yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub abd i could evaluate model. But this repo not include converting to int8 by using image (PTQ). I want to have a calibration cache file with Python API and put this file in deepstream app config to receive int8 .engine model.
Could you give me some advice? @Fiona.Chen
INT8 calibration can be generated with TensorRT APIs. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
If your ONNX model has been generated successfully, the TensorRT/tools/Polygraphy at main · NVIDIA/TensorRT · GitHub tool can be used. Please refer to TensorRT/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt at main · NVIDIA/TensorRT · GitHub
For more TensorRT related issues and questions, please raise topic in the TensorRT forum Latest Deep Learning (Training & Inference)/TensorRT topics - NVIDIA Developer Forums.