• Hardware Platform GPU
• DeepStream Version 7.1
• TensorRT Version 10.3
• NVIDIA GPU Driver Version 560.35.03
• Issue Type bug
I use the Docker container nvcr.io/nvidia/deepstream:7.1-gc-triton-devel
and can successfully run the C/C++ and Python samples.
My goal is to get raw tensor output of a simple custom model as accurate as with TensorRT’s API. It is not a classifier, detector or anything specific, but just a model which takes a 128×128 float32 grayscale image normalized to [0, 1] as input and outputs 10 coefficients I get directly without any post-process.
The model is a ONNX file trained with TensorFlow and converted by tf2onnx.convert
(Python). It has the following dimensions (inspected with netron
):
- input:
tensor: float32[unk__8,1,128,128]
- (originally it was
[unk__8,128,128]
but it appears I was forced to add a dimension for DeepStream),
- (originally it was
- output:
tensor: float32[unk__9,10]
, - note: I have not set those
unk__x
myself.
The issue I have is that when I infere the .engine
file generated (by DeepStream) with a TensorRT API script the result is accurate enough regarding the original TensorFlow result, but when using it through DeepStream the result, while obviously close, is not accurate enough at all.
For instance here are the results:
TRT API | DS nvinfer | Abs. diff. |
---|---|---|
-0.003 | -0.048 | 0.045 |
-0.033 | -0.062 | 0.029 |
3.022 | 3.401 | 0.379 |
0.002 | 0.002 | 0.000 |
0.006 | -0.004 | 0.002 |
-0.003 | -0.047 | 0.044 |
-0.009 | -0.044 | 0.035 |
0.101 | 0.141 | 0.040 |
-0.002 | -0.026 | 0.024 |
-0.011 | -0.126 | 0.115 |
Here is a picture of the ONNX model analyzed by netron
.
Here is the DeepStream config file for nvinfer
as config_nvinfer.yml
:
property:
gie-unique-id: 1 # Unique ID for generated inference engine
onnx-file: model.onnx
model-engine-file: model.onnx_b1_gpu0_fp32.engine
network-type: 100 # Other type of network
model-color-format: 2 # Grayscale
output-tensor-meta: 1 # Makes output meta data available
net-scale-factor: 0.00392156862745098 # Converts each pixel from [0, 255] to [0, 1]
offsets: 0.0 # To be subtracted to each pixel (`mean`)
Here is the DeepStream runtime as app.py
:
import gi
gi.require_version("Gst", "1.0")
from gi.repository import Gst, GLib
import pyds
import cv2 as cv
import numpy as np
import ctypes
Gst.init(None)
pipeline = Gst.parse_launch(
"appsrc name=appsrc caps=video/x-raw,format=GRAY8,width=128,height=128,framerate=0/1 ! "
"nvvideoconvert ! "
"mux.sink_0 nvstreammux name=mux batch-size=1 width=128 height=128 ! "
"nvinfer name=nvinfer config-file-path=config_nvinfer.yml ! "
"fakesink"
)
# Push image to pipeline
img = cv.imread('image.png', cv.IMREAD_GRAYSCALE) # np.uint8
buffer = Gst.Buffer.new_wrapped(img.tobytes())
pipeline.get_by_name("appsrc").emit("push-buffer", buffer)
# Probe inference output
def nvinfer_src_pad_buffer_probe_callback(pad, info, u_data):
gst_buffer = info.get_buffer()
if not gst_buffer:
return Gst.PadProbeReturn.OK
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
if not batch_meta:
return Gst.PadProbeReturn.OK
l_frame = batch_meta.frame_meta_list
while l_frame is not None:
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
l_user = frame_meta.frame_user_meta_list
while l_user is not None:
user_meta = pyds.NvDsUserMeta.cast(l_user.data)
if user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META:
tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
layer_index = 0
layer = pyds.get_nvds_LayerInfo(tensor_meta, layer_index)
ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
v = np.ctypeslib.as_array(ptr, shape=(10,))
print(v)
l_user = l_user.next
l_frame = l_frame.next
return Gst.PadProbeReturn.OK
pipeline.get_by_name("nvinfer").get_static_pad("src").add_probe(
Gst.PadProbeType.BUFFER, nvinfer_src_pad_buffer_probe_callback, 0
)
# Runtime
pipeline.set_state(Gst.State.PLAYING)
try:
GLib.MainLoop().run()
except KeyboardInterrupt:
pass
pipeline.set_state(Gst.State.NULL)
Here is the TensorRT API script:
import common # module included with TensorRT Python bindings
import tensorrt as trt
import cv2 as cv
import numpy as np
img = cv.imread('image.png', cv.IMREAD_GRAYSCALE) # uint8
# Replicate DeepStream's preprocessing
NET_SCALE_FACTOR = 0.00392156862745098 # 1 / 255
MEAN = 0.0
img = img.astype(np.float32)
img = NET_SCALE_FACTOR * (img - MEAN)
# Infere
with open('model.onnx_b1_gpu0_fp32.engine', "rb") as file, trt.Runtime(trt.Logger()) as runtime:
with runtime.deserialize_cuda_engine(file.read()) as engine:
with engine.create_execution_context() as context:
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
inputs[0].host = img
trt_outputs = common.do_inference(
context,
engine,
bindings,
inputs,
outputs,
stream
)
print(trt_outputs[0])
Any idea what’s wrong ?