Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds

Summary: I can’t get DeepStream to produce correct results for a custom classifier no matter how I set the config for nvinfer.

I custom trained a 2-class ResNet34 classifier with softmax output in PyTorch and saved this model. I’ve converted and saved this model in 32-bit precision to both ONNX (via onnxruntime) and TRT (via trtexec). Additionally, I’ve converted the ONNX model to TRT via nvinfer, which is how I’ve set up the DeepStream config.

The models require preprocessing normalisation, and when these are set equivalently, all four of these models produce exactly the same inference softmax output (to within several decimal points accuracy) over a range of images as well as a video. The normalisation used looks like this:

FIXED_OFFSET = 0.449
FIXED_STDDEV = 0.226
transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([FIXED_OFFSET] * 3, [FIXED_STDDEV] * 3)
])

However, I can’t get nvinfer to output the same results within a DeepStream pipeline, whether the classifier is used for primary inference or secondary inference. (I know the results are wrong because I’m outputting them from NvDsClassifierMeta and comparing them to inference outside of DS.) There is no image resizing implied by my config, and I’ve checked all sensible variations of the normalisation settings, which to my knowledge are net-scale-factor, offsets, and model-color-format. I use a constant normalisation across all channels so I can’t be getting the RGB channels confused.

The two obvious conclusions are: either I’ve still got something that needs to be changed in the config, or nvinfer is doing something it shouldn’t be. I did notice one bizarre behavior: if I change scaling-filter, the softmax outputs change completely. This is unexpected because there is no need for image/video scaling anywhere in my Deepstream pipeline.

My pipeline is equivalent to this:

gst-launch-1.0 filesrc location=example.avi ! h264parse ! avdec_h264 ! nvvideoconvert ! \
    m.sink_0 nvstreammux name=m batch-size=1 batched-push-timeout=40000 width=224 height=224 ! \
    nvinfer config-file-path=primary_classification_test.txt unique-id=1 ! nvdsosd ! nveglglessink

The nvinfer config looks like this:

[property]
gpu-id=0
infer-dims=3;224;224
net-scale-factor=0.017352074
offsets=114.495;114.495;114.495
onnx-file=/opt/nvidia/deepstream/deepstream-6.3/sources/project/person_classifier_test.onnx
labelfile-path=secondary_labels.txt
#force-implicit-batch-dim=1
batch-size=1
#model-color-format=0
process-mode=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
is-classifier=1
output-blob-names=predictions/Softmax
#output-blob-names=output
#classifier-async-mode=1
classifier-threshold=0
maintain-aspect-ratio=0
input-object-min-width=0
input-object-min-height=0
#operate-on-gie-id=1
#operate-on-class-ids=0;1;2;3
classifier-type=personclassifier
#scaling-filter=0
scaling-compute-hw=0

What am I missing?

• Hardware Platform (Jetson / GPU) RTX 2080
• DeepStream Version 6.3
• JetPack Version (valid for Jetson only)
• TensorRT Version 8.5.3.0
• NVIDIA GPU Driver Version (valid for GPU only) 530.41.03
• Issue Type( questions, new requirements, bugs) bugs/question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) Described above
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

I don’t think this pipeline is the same to the TensorRT pipeline.

  1. You introduce extra image scaling with setting different nvstreammux output resolution to the original video resolution.

  2. There is hardware video decoder in RTX 2080. Please use hardware video decoder instead of software decoder “avdec_h264”

  3. Please set the nvstreammux output resolution as the same to the original video resolution.

  4. Please check the Pytorch document to identify the image scaling algorithm. torchvision.transforms — Torchvision master documentation (pytorch.org). You need to set the “scaling-filter” parameter for nvinfer to do the same scaling.

  5. You gives the Pytorch normalization as:

The formula can be found in pytorch document torchvision.transforms — Torchvision master documentation (pytorch.org)

And we also gives the nvinfer normalization formula here Gst-nvinfer — DeepStream 6.3 Release documentation(please check the red texts).

These are simple multiplications and additions. It is easy to translate the pytorch parameters to nvinfer parameters.

  1. Please refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Hi and thanks for your response.

I’m not sure what you mean by this not being the same pipeline to the TensorRT pipeline. This is a valid pipeline, and it uses nvinfer, so inherently it uses TensorRT.

Nevertheless:

You introduce extra image scaling with setting different nvstreammux output resolution to the original video resolution.

My input video is 224x224, the streammux setting is 224x224, my classifier network input dimension is 224x224. Hypothetically, no scaling is configured.

There is hardware video decoder in RTX 2080. Please use hardware video decoder instead of software decoder “avdec_h264”

While your point is valid and my method is less efficient, I don’t see how this can affect the validity of the result, unless hardware/software decoding somehow produces different images?

Please check the Pytorch document to identify the image scaling algorithm.

The input video is 224x224, so no scaling takes place in PyTorch. I can confirm that changing the scaling algorithm doesn’t affect the results of PyTorch inference.

You need to set the “scaling-filter” parameter for nvinfer to do the same scaling.

It shouldn’t matter which type of scaling I choose because no scaling should take place. However, changing “scaling-filter” does change the results in DeepStream inference dramatically, which makes me think that DeepStream is doing unnecessary or incorrect scaling within this pipeline.

You gives the Pytorch normalization as:

My normalization parameters appear to be consistent with the nvinfer formula as well as other answers I found within the NVIDIA forums.

How did you calculate these values?

And please set “network-type=1” in your configuration file.

My PyTorch training is based on an offset of 0.449 and standard deviation of 0.226. These are for pixel values in the range [0, 1].
DeepStream is using values 0-255, so the offset is 0.449 * 255 = 114.495.
Converting 0-255 values to 0-1 and then further dividing by the standard deviation gives 1/255 / 0.226 = 0.017352074.

I’ve added this now. The results are identical to before.

Seems the configurations are correct now. Can you provide your model and codes for us to reproduce the issue?

Yes, may I message you the model and code privately?

Yes, please send me message.

In the code and model I’ve sent you, the way I test the output is like this:

# Train a model (note: image dataset not included)
python train.py

# Inference outside DeepStream
python classifier_check.py example.avi

# Inference inside DeepStream
python nvidia_test.py example.avi classifier_test.json

For the 224x224 example.avi video I’ve provided, some of the differences look like in the attached image.

This isn’t a realistic video we’d be using, but for our real videos, the difference in output values are not just up to 0.03, but can even exceed 0.4.

Please provide the configuration files and model too.

What are the data you shown here?

The model was provided with the ZIP file I sent you. Here are the missing config files.

labels.txt (3 Bytes)
classification_test.txt (594 Bytes)

Each of these two commands outputs the softmax probabilities for each frame of example.avi. The mean pixel value of the image is also printed out just to confirm that the input images are identical.

python classifier_check.py example.avi
python nvidia_test.py example.avi classifier_test.json

The diff shows the inference output with non-DeepStream inference (left) and with DeepStream inference (right). When nvinfer is configured correctly, I’m expecting these results to match exactly, which they don’t.

Let’s check the preprocessing before inferencing with the model.

The pytorch script pipeline is
video frame get by opencv(decoding + YUV to RGB conversion) => pytorch scaling => normalization => converting to tensor

The DeepStream pipeline is

video decoding => scaling + YUV to RGB => normalization + tensor converting

Currently the DeepStream YUV to RGB conversion + scaling does not output exactly bit by bit aligned result with opencv + ptorch.

Have you found any wrong classification results with your test case? Is there any frame be identified as different class compared to the pytorch result?

With the initial video I sent you, the classification scores are all very similar across all frames.

However, the scores are much less consistent for more realistic input derived from our test data. I’ll send you an example video of this. This video has 31 frames, and 2 of the frames are misclassified. Several of the other frames also have large discrepancies in class scores.

Here’s a side-by-side comparison showing the different results. You can reproduce these numbers exactly by running the provided scripts on this second video. (These are from the same scripts but just with output formatted differently.)

In order to make the DeepStream pipeline similar or equal to the pytorch pipeline, you may need to skip gst-nvinfer internal preprocessing. Please customize the preprocessing(scaling, format conversion, normalization) with the nvdspreprocess plugin Gst-nvdspreprocess (Alpha) — DeepStream documentation 6.4 documentation to make the input tensor the same as pytorch.

I’ve followed your suggestion and set up a new pipeline with nvdspreprocess, equivalent to this:

gst-launch-1.0 filesrc location=example.avi ! h264parse ! avdec_h264 ! nvvideoconvert ! \
    m.sink_0 nvstreammux name=m batch-size=1 width=224 height=224 ! \
    nvdspreprocess config-file=preprocess.txt unique-id=0 target-unique-ids=1 ! \
    nvinfer config-file-path=primary_classification_test.txt unique-id=1 input-tensor-meta=0 ! nvdsosd ! nveglglessink

The preprocess.txt config I’m using:

[property]
enable=1
    # list of component gie-id for which tensor is prepared
target-unique-ids=1
    # 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0
    # 0=process on objects 1=process on frames
process-on-frame=1
    #uniquely identify the metadata generated by this element
unique-id=0
    # gpu-id to be used
gpu-id=0
    # if enabled maintain the aspect ratio while scaling
maintain-aspect-ratio=1
    # if enabled pad symmetrically with maintain-aspect-ratio enabled
symmetric-padding=1
    # processig width/height at which image scaled
processing-width=224
processing-height=224
    # max buffer in scaling buffer pool
scaling-buf-pool-size=6
    # max buffer in tensor buffer pool
tensor-buf-pool-size=6
    # tensor shape based on network-input-order
network-input-shape=1;3;224;224
    # 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
    # 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
    # tensor name same as input layer name
tensor-name=input
    # 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE 3=NVBUF_MEM_CUDA_UNIFIED
scaling-pool-memory-type=0
    # 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU 2=NvBufSurfTransformCompute_VIC
scaling-pool-compute-hw=1
    # Scaling Interpolation method
    # 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
    # 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
    # 6=NvBufSurfTransformInter_Default
scaling-filter=0
    # custom library .so path having custom functionality
custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/gst-plugins/libcustom2d_preprocess.so
    # custom tensor preparation function name having predefined input/outputs
    # check the default custom library nvdspreprocess_lib for more info
custom-tensor-preparation-function=CustomTensorPreparation

[user-configs]
   # Below parameters get used when using default custom library nvdspreprocess_lib
   # network scaling factor
#pixel-normalization-factor=0.017352074
pixel-normalization-factor=100.0
   # mean file path in ppm format
#mean-file=
   # array of offsets for each channel
#offsets=114.495;114.495;114.495
offsets=0;0;0

[group-0]
src-ids=0
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=0

Meanwhile, nvinfer is configured exactly as before, except for input-tensor-meta=0 or its equivalent input-tensor-from-meta=0 within the config file.

Under this pipeline and config, scaling/normalisation is still being applied by nvinfer, not nvdspreprocess. I can tell because changing net-scale-factor and offsets makes a difference within the nvinfer config file, whereas it doesn’t make any difference changing pixel-normalization-factor and offsets in nvdspreprocess’s user config section of the config file.

What must I do to make the preprocessing happen at nvdspreprocess not nvinfer? The docs read like target-unique-ids and input-tensor-meta should achieve this.

Please customize your own nvdspreprocess low level library to be compatible to pytorch algorithms. The sample nvdspreprocess library does not match your requirement.