How to get `nvinfer` to be as accurate as TensorRT's API?

• Hardware Platform GPU
• DeepStream Version 7.1
• TensorRT Version 10.3
• NVIDIA GPU Driver Version 560.35.03
• Issue Type bug

I use the Docker container nvcr.io/nvidia/deepstream:7.1-gc-triton-devel and can successfully run the C/C++ and Python samples.

My goal is to get raw tensor output of a simple custom model as accurate as with TensorRT’s API. It is not a classifier, detector or anything specific, but just a model which takes a 128×128 float32 grayscale image normalized to [0, 1] as input and outputs 10 coefficients I get directly without any post-process.

The model is a ONNX file trained with TensorFlow and converted by tf2onnx.convert (Python). It has the following dimensions (inspected with netron):

  • input: tensor: float32[unk__8,1,128,128]
    • (originally it was [unk__8,128,128] but it appears I was forced to add a dimension for DeepStream),
  • output: tensor: float32[unk__9,10],
  • note: I have not set those unk__x myself.

The issue I have is that when I infere the .engine file generated (by DeepStream) with a TensorRT API script the result is accurate enough regarding the original TensorFlow result, but when using it through DeepStream the result, while obviously close, is not accurate enough at all.

For instance here are the results:

TRT API DS nvinfer Abs. diff.
-0.003 -0.048 0.045
-0.033 -0.062 0.029
3.022 3.401 0.379
0.002 0.002 0.000
0.006 -0.004 0.002
-0.003 -0.047 0.044
-0.009 -0.044 0.035
0.101 0.141 0.040
-0.002 -0.026 0.024
-0.011 -0.126 0.115

Here is a picture of the ONNX model analyzed by netron.

Here is the DeepStream config file for nvinfer as config_nvinfer.yml:

property:
  gie-unique-id: 1  # Unique ID for generated inference engine
  onnx-file: model.onnx
  model-engine-file: model.onnx_b1_gpu0_fp32.engine
  network-type: 100  # Other type of network
  model-color-format: 2  # Grayscale
  output-tensor-meta: 1  # Makes output meta data available
  net-scale-factor: 0.00392156862745098  # Converts each pixel from [0, 255] to [0, 1]
  offsets: 0.0  # To be subtracted to each pixel (`mean`)

Here is the DeepStream runtime as app.py:

import gi
gi.require_version("Gst", "1.0")
from gi.repository import Gst, GLib
import pyds
import cv2 as cv
import numpy as np
import ctypes

Gst.init(None)

pipeline = Gst.parse_launch(
    "appsrc name=appsrc caps=video/x-raw,format=GRAY8,width=128,height=128,framerate=0/1 ! "
    "nvvideoconvert ! "
    "mux.sink_0 nvstreammux name=mux batch-size=1 width=128 height=128 ! "
    "nvinfer name=nvinfer config-file-path=config_nvinfer.yml ! "
    "fakesink"
)

# Push image to pipeline

img = cv.imread('image.png', cv.IMREAD_GRAYSCALE)  # np.uint8
buffer = Gst.Buffer.new_wrapped(img.tobytes())
pipeline.get_by_name("appsrc").emit("push-buffer", buffer)

# Probe inference output

def nvinfer_src_pad_buffer_probe_callback(pad, info, u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        return Gst.PadProbeReturn.OK
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    if not batch_meta:
        return Gst.PadProbeReturn.OK
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        l_user = frame_meta.frame_user_meta_list
        while l_user is not None:
            user_meta = pyds.NvDsUserMeta.cast(l_user.data)
            if user_meta.base_meta.meta_type == pyds.NvDsMetaType.NVDSINFER_TENSOR_OUTPUT_META:
                tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
                layer_index = 0
                layer = pyds.get_nvds_LayerInfo(tensor_meta, layer_index)
                ptr = ctypes.cast(pyds.get_ptr(layer.buffer), ctypes.POINTER(ctypes.c_float))
                v = np.ctypeslib.as_array(ptr, shape=(10,))
                print(v)
            l_user = l_user.next
        l_frame = l_frame.next
    return Gst.PadProbeReturn.OK

pipeline.get_by_name("nvinfer").get_static_pad("src").add_probe(
    Gst.PadProbeType.BUFFER, nvinfer_src_pad_buffer_probe_callback, 0
)

# Runtime

pipeline.set_state(Gst.State.PLAYING)
try:
    GLib.MainLoop().run()
except KeyboardInterrupt:
    pass
pipeline.set_state(Gst.State.NULL)

Here is the TensorRT API script:

import common  # module included with TensorRT Python bindings
import tensorrt as trt
import cv2 as cv
import numpy as np

img = cv.imread('image.png', cv.IMREAD_GRAYSCALE)  # uint8

# Replicate DeepStream's preprocessing

NET_SCALE_FACTOR = 0.00392156862745098  # 1 / 255
MEAN = 0.0
img = img.astype(np.float32)
img = NET_SCALE_FACTOR * (img - MEAN)

# Infere

with open('model.onnx_b1_gpu0_fp32.engine', "rb") as file, trt.Runtime(trt.Logger()) as runtime:
    with runtime.deserialize_cuda_engine(file.read()) as engine:
        with engine.create_execution_context() as context:
            inputs, outputs, bindings, stream = common.allocate_buffers(engine)
            inputs[0].host = img
            trt_outputs = common.do_inference(
                context,
                engine,
                bindings,
                inputs,
                outputs,
                stream
            )
print(trt_outputs[0])

Any idea what’s wrong ?

what is the resolution of image.png? DeepStream also leverages TensorRT to do inference. if the inference is result is not the same. please check if the preprocessed data is the same.

  1. you can add the following nvinfer configuration first.
  dump-input-tensor: 1

after run again, nvinfer will dump the preprocessed data to ip_tensor_dump.bin in the current directory. you can compare it with the preprocessed data using in TensorRT test.

1 Like

It is 128×128 as set for the appsrc element.

Thanks for the tip.

Actually the preprocessing I really do is a normalization converted to uint8:

img = cv.imread('image.png', cv.IMREAD_UNCHANGED)
img = cv.normalize(img, None, 0, 255, cv.NORM_MINMAX, cv.CV_8U)

I got the dumped input tensor that I’ve read the following way:

import numpy as np
import matplotlib.pyplot as plt

data = np.fromfile('ip_tensor_dump.bin', dtype=np.float32, count=128*128)
print(f"min: {data.min():.20f}")
print(f"max: {data.max():.20f}")

img = np.reshape(data, (128, 128))
plt.imshow(img, cmap='gray', vmin=0, vmax=1)
plt.show()

which outputs:

min: 0.06274510174989700317
max: 0.92156869173049926758

(and displays the visually faithful input image).

Unfortunately with the same git-cloned code executed on a Jetson AGX Orin with the same versions of DeepStream and TensorRT installed, the results are accurate (!) regarding TensorFlow/TensorRT and the same “dumped input tensor” inspection as above gives:

min: 0.00000000000000000000
max: 1.00000000000000000000

Well, then I really don’t know what to do… Downgrading the GPU driver ?

1 Like
  1. could you share the whole code? did you use “cv.imread, cv.normalize” in both two tests?
  2. could you elaborate on your conclusions? is there any DeepStream issue? About “displays the visually faithful input image”, do you mean restoring ip_tensor_dump.bin generated by DeepStream can get the correct image compared with image.png?
  3. please compare the preprocessed data of app.py test and “ensorRT API” test. ip_tensor_dump.bin is normalized data. after multiplying NET_SCALE_FACTOR, it will become gray value. then you can use player to check if the gray picture is correct.
1 Like

I’m sorry for the mix-up. The whole codes is what I gave in the first message, except that in both case:

img = cv.imread('image.png', cv.IMREAD_GRAYSCALE)

is replaced by:

img = cv.imread('image.png', cv.IMREAD_UNCHANGED)
img = cv.normalize(img, None, 0, 255, cv.NORM_MINMAX, cv.CV_8U)

But anyway, the exact same code gives different results on laptop container versus Jetson.

My conclusion is that it must be an hardware-implementation-specific issue of DeepStream’s Gst-nvinfer or some dependency it uses.

I just tested on another Jetson AGX Orin which has DeepStream 6.3 and it also was alright. I may try on another PC/GPU at some point but not right now.

Yes, but only “faithful”, in the sense I recognize it and visually couldn’t tell it is wrong (it also has the same orientation), but when calling np.min() and np.max() I can conclude that it at least is not normalized anymore.

But here are a bunch more stats:

FILE_PATH = 'image_original.png'
img = cv.imread(IMAGE_PATH, cv.IMREAD_UNCHANGED)
img_original_normalized = cv.normalize(img.astype(np.float32), None, 0.0, 1.0, cv.NORM_MINMAX)

FILE_PATH = 'ip_tensor_dump_jetson.bin'
data = np.fromfile(FILE_PATH, dtype=np.float32, count=128*128)
img_dumped_tensor_jetson = np.reshape(data, (128, 128))

FILE_PATH = 'ip_tensor_dump_laptop.bin'
data = np.fromfile(FILE_PATH, dtype=np.float32, count=128*128)
img_dumped_tensor_laptop = np.reshape(data, (128, 128))

img_dumped_tensor_laptop_normalized = cv.normalize(img_dumped_tensor_laptop, None, 0.0, 1.0, cv.NORM_MINMAX)

def print_min_max(img):
    print(f"  min: {img.min():.6f}")
    print(f"  max: {img.max():.6f}")
    print("")

print("img_original_normalized:")
print_min_max(img_original_normalized)

print("img_dumped_tensor_jetson:")
print_min_max(img_dumped_tensor_jetson)

print("img_dumped_tensor_laptop:")
print_min_max(img_dumped_tensor_laptop)

print("img_dumped_tensor_laptop_normalized:")
print_min_max(img_dumped_tensor_laptop_normalized)

def print_stats(img1, img2):
    diff = img1 - img2
    print(f"  abs(diff.min()): {abs(diff.min()):.6f}")
    print(f"  abs(diff.max()): {abs(diff.max()):.6f}")
    print(f"  abs(diff.mean()): {abs(diff.mean()):.6f}")
    print(f"  abs(diff.std()): {abs(diff.std()):.6f}")
    mse = np.mean((img1 - img2) ** 2)
    print(f'  MSE: {mse:.10f}')
    psnr = cv.PSNR(img1, img2)
    print(f'  PSNR: {psnr:.3f} dB')
    print("")

print("img_original_normalized vs img_original_normalized:")
print_stats(img_original_normalized, img_original_normalized)

print("img_dumped_tensor_jetson vs img_original_normalized:")
print_stats(img_dumped_tensor_jetson, img_original_normalized)

print("img_dumped_tensor_laptop vs img_original_normalized:")
print_stats(img_dumped_tensor_laptop, img_original_normalized)

print("img_dumped_tensor_laptop_normalized vs img_original_normalized:")
print_stats(img_dumped_tensor_laptop_normalized, img_original_normalized)

which gives:

img_original_normalized:
  min: 0.000000
  max: 1.000000

img_dumped_tensor_jetson:
  min: 0.000000
  max: 1.000000

img_dumped_tensor_laptop:
  min: 0.062745
  max: 0.921569

img_dumped_tensor_laptop_normalized:
  min: 0.000000
  max: 1.000000

img_original_normalized vs img_original_normalized:
  abs(diff.min()): 0.000000
  abs(diff.max()): 0.000000
  abs(diff.mean()): 0.000000
  abs(diff.std()): 0.000000
  MSE: 0.0000000000
  PSNR: 361.202 dB

img_dumped_tensor_jetson vs img_original_normalized:
  abs(diff.min()): 0.001960
  abs(diff.max()): 0.001960
  abs(diff.mean()): 0.000294
  abs(diff.std()): 0.001102
  MSE: 0.0000013002
  PSNR: 106.991 dB

img_dumped_tensor_laptop vs img_original_normalized:
  abs(diff.min()): 0.078431
  abs(diff.max()): 0.062745
  abs(diff.mean()): 0.041551
  abs(diff.std()): 0.030383
  MSE: 0.0026496141
  PSNR: 73.899 dB

img_dumped_tensor_laptop_normalized vs img_original_normalized:
  abs(diff.min()): 0.072440
  abs(diff.max()): 0.080154
  abs(diff.mean()): 0.000494
  abs(diff.std()): 0.023379
  MSE: 0.0005468346
  PSNR: 80.752 dB

It is notable that even after renormalizing the dumped tensor from the laptop (Docker container) it remains not as good as the dumped one from the Jetson.

Of course I have done it, but I just redid it and yes they are.

Sorry I don’t understand this.

I also tried to add to nvvidioconvert:

"nvvideoconvert ! video/x-raw(memory:NVMM),format=RGBA ! "

because it appeared it chose NV12 otherwise, but the accuracy is in the very same range and put thru the stats making code showed above, while a bit different, remain very close.

1 Like
  1. nvstreammux only accept NV12/RBGA/I420. In Deepstream test, there is no capsfilter after nvvideoconvert. nvvideoconvert maybe output NV12 format. please refer to this FAQ for how to dump pipeline graph. then you can check if there is data conversion, which will cause data loss.
  2. To reduce date conversion, please refer to thefollowing pipeline.
appsrc name=appsrc caps=video/x-raw,format=RGB ,width=128,height=128,framerate=0/1  ! nvvideoconvert !  video/x-raw\(memory:NVMM\),format=RGBA ! mux.sink_0 nvstreammux name=mux batch-size=1 width=128 height=128  ! nvinfer name=nvinfer config-file-path=config_nvinfer.yml  ! fakesink.

I modified the pipeline as requested, consisting in replacing the appsrc’s format’s attribut from GRAY8 to RGB.

Here is the dumped graph. I can’t see if there is any data conversion or not. Do you ?

1 Like
  1. nvinfer plugin and low-level lib are opensource. it supports converting rgba to gray. you can make appsrc to output RGB format because nvstreammux does not accept GRAY8.
  2. it is not negotiated pipeline. you need to dump pipeline in playing status. if you still need to dump pipeline, please refer to 1.dff in this topic.
1 Like

So apparently there was some implicit conversion at some point, I suppose on the nvvideoconvert sink.

But the elements’ > clearly indicates the elements are playing. How can it be not negotiated if it is playing and actually infering ?

AKY, there are many Gstreamer states. only in playing state, some formats will fixed. As the 1.diff shown, you can use parse_state_changed to get playing state.

OK, here is the apparently negotiated playing pipeline.

The only conversion I see is the RGB to RGBA in the Gstnvvideoconvert.

From the pipeline you share, before nvinfer, the date is still rgb/rgba, which has no data loss, then nvinfer converts rgba to gray do do normalization. please refer to my comment on Nov 11. is there still big difference?

EDIT: Sorry there was a mistake in the previous shown data. Actually the results are even worse because the image is duplicated 3 times next to next on a row with some garbage on the remaining space under, and the whole thing is downscaled to match the 128×128 target.

So it looks like wrong to put format=RGB for the appsrc with my grayscale source. Actually gst-inspect-1.0 nvvideoconvert show that nvvideoconvert accepts GRAY8 and did it perfectly on the Jetson.

Also setting both RGBA or NV12 for nvvideoconvert appears to give similar results (both also accurate on the Jetson).

if the preprocessed data between DeepStream test and TensorRT test is still different. I suggest comparing the middle values step by step. form the analysis above, before nvinfer, there is no data loss because the format is always rgb/rgba. we only need to compare the rgba->gray and gray normalization. please use the following steps to narrow down this issue.

  1. compare the the result of rgba->gray.
    please use this FAQ to dump the gray value, then compare with TensorRT test.
  2. compare the result of gray normalization.
    if the results in the step1 are the same, please continue to compare gray normalization value. ip_tensor_dump.bin mentioned on Nov 11 is the gray normalization value. the formula is gray value multiplying NET_SCALE_FACTOR .

I edited my last message because the data it showed were wrongly labeled.

It partially failed, so I resolved the rejected parts manually:

/opt/nvidia/deepstream/deepstream-7.1/sources/libs/nvdsinfer# patch -p1 < dump_infer_input_to_file.patch.txt
patching file Makefile
Hunk #1 succeeded at 33 with fuzz 2 (offset 4 lines).
Hunk #2 FAILED at 38.
1 out of 2 hunks FAILED -- saving rejects to file Makefile.rej
patching file nvdsinfer_context_impl.cpp
Hunk #1 succeeded at 23 with fuzz 2.
Hunk #2 succeeded at 1564 (offset 303 lines).
Hunk #3 succeeded at 1753 (offset 245 lines).
Hunk #4 succeeded at 1896 (offset 293 lines).
Hunk #5 succeeded at 2398 (offset 392 lines).
patching file nvdsinfer_context_impl.h
Hunk #1 FAILED at 69.
Hunk #2 succeeded at 465 (offset 25 lines).
1 out of 2 hunks FAILED -- saving rejects to file nvdsinfer_context_impl.h.rej

Generated Makefile.rej:

--- Makefile
+++ Makefile
@@ -38,7 +38,7 @@ endif
 LIBS := -shared -Wl,-no-undefined \
 	 -lnvinfer -lnvinfer_plugin -lnvonnxparser -lnvparsers \
 	-L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart \
-	-lopencv_objdetect -lopencv_imgproc -lopencv_core
+	-lopencv_objdetect -lopencv_imgproc -lopencv_core -lopencv_imgcodecs
 
 LIBS+= -L$(LIB_INSTALL_DIR) -lnvdsgst_helper -lnvdsgst_meta -lnvds_meta \
        -lnvds_inferutils -ldl \

Manual resolution for Makefile:

LIBS := -shared -Wl,-no-undefined \
	 -lnvinfer -lnvinfer_plugin -lnvonnxparser -lpthread \
	-L/usr/local/cuda-$(CUDA_VER)/lib64/ -lcudart

ifeq ($(WITH_OPENCV),1)
-LIBS += -lopencv_objdetect -lopencv_imgproc -lopencv_core
+LIBS += -lopencv_objdetect -lopencv_imgproc -lopencv_core -lopencv_imgcodecs
endif

LIBS+= -L$(LIB_INSTALL_DIR) -lnvdsgst_helper -lnvdsgst_meta -lnvds_meta \
       -lnvds_inferlogger -lnvds_inferutils -ldl \
       -Wl,-rpath,$(LIB_INSTALL_DIR)

Generated nvdsinfer_context_impl.h.rej:

--- nvdsinfer_context_impl.h
+++ nvdsinfer_context_impl.h
@@ -69,6 +69,10 @@ public:
     }
     bool setScaleOffsets(float scale, const std::vector<float>& offsets = {});
     bool setMeanFile(const std::string& file);
+#ifdef DUMP_INPUT_TO_FILE
+    float getScale() { return m_Scale; };
+    NvDsInferFormat getNetworkFormat() { return m_NetworkInputFormat; };
+#endif
 
     NvDsInferStatus allocateResource();
     NvDsInferStatus syncStream();

Manual resolution for nvdsinfer_context_impl.h:

    bool setScaleOffsets(float scale, const std::vector<float>& offsets = {});
    bool setMeanFile(const std::string& file);
+#ifdef DUMP_INPUT_TO_FILE
+    float getScale() { return m_Scale; };
+    NvDsInferFormat getNetworkFormat() { return m_NetworkInputFormat; };
+#endif
    bool setInputOrder(const NvDsInferTensorOrder order);

    NvDsInferStatus allocateResource();
    NvDsInferStatus syncStream();

For the compilation I had to call make WITH_OPENCV=1 so that could resolve a #include <opencv2/imgcodecs.hpp>.

I moved the binary:

# mv libnvds_infer.so /opt/nvidia/deepstream/deepstream-7.1/lib

When I run my app.py:

(python3:91177): GStreamer-WARNING **: 17:13:04.124: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_infer.so': libde265.so.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/root/workspace/project/deepstream/app.py", line 12, in <module>
    pipeline = Gst.parse_launch(
gi.repository.GLib.GError: gst_parse_error: no element "nvinfer" (1)

so:

# ls /usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream | grep libnvdsgst_infer.so
libnvdsgst_infer.so

it seems it is libde265.so which is missing somewhere.

  1. yes. nvvideoconvert can accept GRAY8, but nvstreammux can’t accept GRAY8. so nvvideoconvert will convert GRAY8 to RGBA for nvstreammux. To reduce format conversion, please use “appsrc(rgba)->nvideocovnert ->“RGBA caps” → nvstreammux…” pipeline.
    please refer to my last comment. the checking method is first please make sure DeepStream and TensorRT tests use the same rgb input. second compare the result of rgba->gray. third compare the result of gray normalization.
  2. About “libde265.so” issue, please run /opt/nvidia/deepstream/deepstream/user_additional_install.sh to install the dependency. if still can’t work, please run the following cmd to check what is not found.
ldd /opt/nvidia/deepstream/deepstream/lib/gst-plugins/libnvdsgst_infer.so
ldd /opt/nvidia/deepstream/deepstream/lib/libnvds_infer.so

No, the original format is meant to be GRAY8 and nvvideoconvert is there exactly for the purpose of converting it to something nvinfer accepts, and it works just right on the Jetson, and almost right on the Docker container on my x86 laptop.

Plus I just told that setting RGB on appsrc produces a very bad image with 3 copies of the same on a row, the whole thing downscaled, etc… It’s similar with RGBA.

With RGB:

With RGBA:

No way it could work.

OK I ran this user_additional_install.sh and now the libnvds_infer.so runs without error, but there is no file generated I can see. Is it meant to be the file generated with the config option dump-input-tensor ? If yes, it is exactly as before, i.e. faithful image but with the shifted min and max.

Today I tried to run the Jetson with the following pipeline, i.e. setting nvvideoconvert’s format to RGBA instead of the implicit NV12:

pipeline = Gst.parse_launch(
    "appsrc name=appsrc caps=video/x-raw,format=GRAY8,width=128,height=128,framerate=0/1 ! "
    "queue ! "
    "nvvideoconvert ! video/x-raw(memory:NVMM),format=RGBA ! "
    "mux.sink_0 nvstreammux name=mux batch-size=1 width=128 height=128 ! "
    "nvinfer name=nvinfer config-file-path=config_nvinfer.yml ! "
    "fakesink"
)

and the output tensor has the very same range of error than what any of NV12 or RGBA does on my laptop.

Anyway I’m running out of time. Maybe DeepStream is not well adapted to / tested with grayscale images ?

Thanks for the sharing! How did you generate that ip_tensor_dump_RGB?
could you share the image.png and gray input model by forum private email? let my check if there is any DeepStream issue.

With this pipeline in the app.py:

pipeline = Gst.parse_launch(
    "appsrc name=appsrc caps=video/x-raw,format=RGB,width=128,height=128,framerate=0/1 ! "
    "queue ! "
    "nvvideoconvert ! "
    "mux.sink_0 nvstreammux name=mux batch-size=1 width=128 height=128 ! "
    "nvinfer name=nvinfer config-file-path=config_nvinfer.yml ! "
    "fakesink"
)

and this setting in the config_nvinfer.yml:

 dump-input-tensor: 1

and to read the generated ip_tensor_dump.bin:

data = np.fromfile('ip_tensor_dump.bin', dtype=np.float32, count=128*128)
img = np.reshape(data, (128, 128))
cv.imwrite('ip_tensor_dump.png', img * 255)

What image.png ? Any grayscale PNG loaded resulting in a np.ndarray of dtype=np.uint8, shape=(128, 128) and normalized produces the same input tensor dumped which is not normalized any more, so there’s no point sending you like lenna.png.

About the gray input model what is it as I just said the patched libnvds_infer.so didn’t produces anything ? If it is the result of a dump-input-tensor: 1 then you have it with the pictures ip_tensor_dump_RGB(/A) provided just above, as the only relevant aspect is their min/max regarding they should be normalized and they aren’t, minus the cv.imwrite to PNG quantized to 8 bits so you can’t get a min value as precise as min=0.06274510174989700317 but more like 0.062 * 255 = 16.

Maybe it’s relevant for bug tracking to note that 0.06274510174989700317 × 255 = 16.000000946 and 0.92156869173049926758 × 255 = 235.000016391, so the issue seems to occurs while the data is in a uint8 form and not a float.

if the origin format is gray8, nvvideoconvert will covert gray8 to rgba because nvsteammux can’t accept gray8, then nvinfer will convert rgba to gray8 for model input tensor. I did a test. the first pipeline will decode png and output gray8 data without any modification. the second pipeline will decode, then covert to RGBA, then convert to GRAY8, then save. After checking, a.bin and b.bin are not the same, but the difference is minimal.

gst-launch-1.0  filesrc location=/home/rec/2.png ! pngdec ! filesink location=a.bin
gst-launch-1.0  filesrc location=/home/rec/2.png ! pngdec ! nvvideoconvert  ! 'video/x-raw, format=RGBA' !  nvvideoconvert  ! 'video/x-raw, format=GRAY8' ! filesink location=b.bin