Bad results running uff classifier (mobilenet) with deepstream

motyaedu · September 7, 2020, 10:06am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson TX2
• DeepStream Version
5.0-20.07
• JetPack Version (valid for Jetson only)
4.4 [L4T 32.4.3]
• TensorRT Version
7.1.3.0

I have converted a tensorflow mobilenet network to an uff model using the following procedure:

Create compatible trt tensorflow graph_def using the tf_trt_models code.
convert to uff using the code:

_ = uff.from_tensorflow(
graph_def,
output_nodes=output_names,
output_filename=“mobilenet.uff”,
text=True,
debug_mode=True,
)

Create engine (.bin) file using the code:

 with trt.Builder(
         TRT_LOGGER
     ) as builder, builder.create_network() as network, trt.UffParser() as parser:
         builder.max_workspace_size = 1 << 28
         builder.max_batch_size = 1
         builder.fp16_mode = True

     parser.register_input("input", (3, 224, 224))
     for output_name in output_names:
         print(f"Registered output {output_name}")
         parser.register_output(output_name)
     parser.parse("mobilenet.uff", network)
     engine = builder.build_cuda_engine(network)

     buf = engine.serialize()
     with open("mobilenet.bin", "wb") as f:
         f.write(buf)

Then, I have tested the model-engine-file (mobilenet.bin) using this python code:

class TrtMobilenet(object):
    def _load_engine(self):
        with open(self.model_path, "rb") as f, trt.Runtime(self.trt_logger) as runtime:
            return runtime.deserialize_cuda_engine(f.read())

    def _create_context(self):
        for binding in self.engine:
            size = (
                trt.volume(self.engine.get_binding_shape(binding))
                * self.engine.max_batch_size
            )
            host_mem = cuda.pagelocked_empty(size, np.float32)
            cuda_mem = cuda.mem_alloc(host_mem.nbytes)
            self.bindings.append(int(cuda_mem))
            if self.engine.binding_is_input(binding):
                self.host_inputs.append(host_mem)
                self.cuda_inputs.append(cuda_mem)
            else:
                self.host_outputs.append(host_mem)
                self.cuda_outputs.append(cuda_mem)
        return self.engine.create_execution_context()

    def __init__(self, model_path, input_shape):
        """Initialize TensorRT plugins, engine and conetxt."""
        self.model_path = model_path
        self.input_shape = input_shape
        self.trt_logger = trt.Logger(trt.Logger.INFO)
        self.engine = self._load_engine()

        self.host_inputs = []
        self.cuda_inputs = []
        self.host_outputs = []
        self.cuda_outputs = []
        self.bindings = []
        self.stream = cuda.Stream()
        self.context = self._create_context()

    def __del__(self):
        """Free CUDA memories."""
        del self.stream
        del self.cuda_outputs
        del self.cuda_inputs

    def read(self, path):
        """Read and resize image."""
        img = Image.open(path).resize(self.input_shape)
        return np.asarray(img)

    def preprocess(self, img):
        img = img.transpose((2, 0, 1)).astype(np.float32)
        # no need normalization
        # img *= 2.0 / 255.0
        # img -= 1.0
        return img

    def detect(self, path):
        """Detect objects in the input image."""
        img_resized = self.read(path)
        img_resized = self.preprocess(img_resized)
        np.copyto(self.host_inputs[0], img_resized.ravel())

        cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)
        self.context.execute_async(
             batch_size=1, bindings=self.bindings, stream_handle=self.stream.handle
        )
        cuda.memcpy_dtoh_async(self.host_outputs[0], self.cuda_outputs[0], self.stream)
        self.stream.synchronize()

       output = self.host_outputs[0]
    
       return img_resized, output

model = TrtMobilenet("mobilenet.bin", (224, 224))
img, scores = model.detect("frame.jpg")

It works as expected, returning the exact same results as the original tensorflow model.

Finally, I have integrated this model into DeepStream using the following pipeline:

gst-launch-1.0 multifilesrc location=${images} caps="image/jpeg,framerate=1/1" ! \
  jpegparse ! \
  nvv4l2decoder ! \
  nvvideoconvert ! \
  'video/x-raw(memory:NVMM),format=(string)NV12' ! \
  mux.sink_0 nvstreammux live-source=0 name=mux batch-size=1 width=224 height=224 ! \
  nvinfer config-file-path=mobilenet.txt batch-size=1 process-mode=1 ! \
  nvstreamdemux name=demux demux.src_0 ! \
  nvvideoconvert ! \
  nvdsosd ! \
  nvvideoconvert ! \
  nvv4l2h265enc ! \
  h265parse ! \
  qtmux ! \
  filesink location=detections.mp4

and its corresponding mobilenet.txt configuration file:

[property]
gpu-id=0
net-scale-factor=1.0
uff-file=mobilenet.uff
model-engine-file=mobilenet.bin
input-dims=3;224;224;0
uff-input-blob-name=input
output-blob-names=scores
labelfile-path=labels.txt
num-detected-classes=2
batch-size=2
model-color-format=1
network-mode=2
is-classifier=1
process-mode=1
classifier-async-mode=0
classifier-threshold=0.
operate-on-gie-id=1
gie-unique-id=4
#parse-classifier-func-name=NvDsInferClassiferParseCustomSoftmax
#custom-lib-path=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_infercustomparser.so

The results (softmax probabilities) are different and wrong. What am I doing wrong?

AastaLLL · September 8, 2020, 4:19am

Hi,

It seems that the color format between image and network are the different.

The configure file set the network color mode into BGR.
https://docs.nvidia.com/metropolis/deepstream/plugin-manual/index.html

Property	Meaning	Type and Range	Example
model-color-format	Color format required by the model	Integer 0: RGB 1: BGR 2: GRAY	model-color-format=0

But the input image format is NV12.

Would you mind to change the output color of nvvideoconvert into BGRx and try it again?

Thanks.

motyaedu · September 8, 2020, 8:53am

Hi,

Yes, actually I made a typo, the model format is RGB. Therefore, I have set model-color-format=0. Moreover, following your advice I have changed the color of nvvideoconvert to RGBA, so the pipeline looks like this:

  gst-launch-1.0 multifilesrc location=${images} caps="image/jpeg,framerate=1/1" ! \
  jpegparse ! \
  nvv4l2decoder ! \
  nvvideoconvert ! \
  'video/x-raw(memory:NVMM),format=(string)RGBA' ! \
  mux.sink_0 nvstreammux live-source=0 name=mux batch-size=1 width=224 height=224 ! \
  nvinfer config-file-path=/usr/share/pt/data/classifiers/classifier.txt batch-size=1 process-mode=1 ! \
  nvstreamdemux name=demux demux.src_0 ! \
  nvvideoconvert ! \
  nvdsosd ! \
  nvvideoconvert ! \
  nvv4l2h265enc ! \
  h265parse ! \
  qtmux ! \
  filesink location=detections.mp4

The results are better but they are not the same… These are the softmax probabilities I get for 20 jpg images, for both the python code and DeepStream:

	python class 0	python class 1	DeepStream class 0	DeepStream class 1
frame_00.jpg	1	0	0.999512	0.000725
frame_01.jpg	0.847167	0.152832	0.486816	0.513184
frame_02.jpg	0.998046	0.001985	0.999023	0.001146
frame_03.jpg	0.998535	0.001543	0.999512	0.000625
frame_04.jpg	0.995117	0.004680	0.999512	0.000708
frame_05.jpg	1	0	1	0
frame_06.jpg	0.625976	0.373779	0.343750	0.656250
frame_07.jpg	0.997558	0.002470	0.998047	0.001932
frame_08.jpg	0.985839	0.013938	0.985840	0.014206
frame_09.jpg	1	0	0.992676	0.007107
frame_10.jpg	0.870605	0.129638	0.590820	0.409180
frame_11.jpg	0.961425	0.038818	0.695801	0.303955
frame_12.jpg	0.969238	0.030563	0.912598	0.087280
frame_13.jpg	0.999023	0.001027	0.999023	0.001186
frame_14.jpg	0.500488	0.499511	0.367920	0.631836
frame_15.jpg	0.018646	0.981445	0.023148	0.977051
frame_16.jpg	0.019958	0.979980	0.011093	0.988770
frame_17.jpg	0.020172	0.979980	0.024063	0.976074
frame_18.jpg	0.036224	0.963867	0.010635	0.989258
frame_19.jpg	0.097412	0.902832	0.036041	0.963867
frame_20.jpg	0.180419	0.819824	0.088318	0.911621

As long as the model is confident and the probabilities are close to 1 and 0, the results for both DeepStream and python are quite similar. On the other hand, probabilities in the middle range are far away from each other.

I do not know, maybe there are still small differences between the image arrays fed into the python model engine and the DeepStream model engine.

Do you think it is something related to differences in jpg decoding between pillow and jpegparse ! nvv4l2decoder ?

Do you think there is something missing in the normalization? I have also tried to set net-scale-factor=1.0 and offsets=0;0;0.

Thank you in advance.

AastaLLL · September 9, 2020, 2:06am

Hi,

This may happen if the preprocessing stage in Deepstream and python has some difference.

Would you mind to share a reproducible source with us?
We want to reproduce this and check it further before giving next suggestion.

Thanks.

motyaedu · September 9, 2020, 7:35am

Hi,

I can share a reproducible source (a docker image) to you privately.

Thank you.

preronamajumder · September 9, 2020, 8:57am

You can try changing the net-scale-factor. For ssd mobilenet v2 that I trained using Tensorflow Object Detection API, I use net-scale-factor=0.03 which gives same detection with deepstream as in the pc.

motyaedu · September 9, 2020, 11:50am

Yes, thank you. I just tried it, and the results were different and bad :(

Actually, I also use SSD mobilenet v1 (Tensorflow Object Detection API) with net-scale-factor=0.0078431372 (which corresponds to 2/255) and offsets=1;1;1, and it works great for me. The classification model, I am trying to fix here, is also a mobilenet v1, and I have tried a lot of combinations of net-scale-factor + offsets with no success.

Moreover, I think I do not need any preprocessing before feeding the RGB image into the network, since the required preprocessing is performed inside the tensorrt graph that I build using the tf_trt_models code.

For SSD networks (created with Tensorflow Object Detection API) , I think we need to specify the net-scale-factor and offset parameters because of this piece of code we usually use to generate the model engines:

    namespace_plugin_map = {
    ....
    "Preprocessor": Input,
    "ToFloat": Input,
    "image_tensor": Input,
    ....
}
graph.collapse_namespaces(namespace_plugin_map)

On the other hand, for my classification mobilenet I do not collapse any namespace. However, I still think that there is something wrong with the preprocessing.

Do you think it is ok to use a float32 input placeholder? maybe DeemStream is expecting an uint8 placeholder? I use it, just like tf_trt_models does:

tf_input = tf.placeholder(tf.float32, [None, net.input_height, net.input_width, 3], name=input_name)

Thank you, any help is appreciated.

AastaLLL · September 10, 2020, 5:03am

Hi,

We have got the data from private message.
Will try it and reproduce this first.

Thanks.

AastaLLL · September 18, 2020, 2:48am

Hi,

Doesn’t get a response from the private message for the reproducible source.
Would you please to check the message and share the data with us?

Thanks.

motyaedu · September 18, 2020, 7:22am

Hi,

Yes, sorry for my late response. You will have it in a couple of hours. I need to anonymize the test data.

Thanks.

motyaedu · September 21, 2020, 3:14pm

Hi,

I sent the data in the private message. Could you reproduce my issue?

Thanks in advance.

AastaLLL · October 13, 2020, 7:28am

Hi,

Thanks for your helping.

We can reproduce this issue in our environment.
And pass this problem to our internal team.

Will update more information with you once we got any feedback.

Thanks.

AastaLLL · October 23, 2020, 2:14am

Hi,

Thanks for your patience.

Here is some update on this issue:
The issue comes from JPEG decoding and the color conversion to RGBA.
We are still working on the fix. I will keep you updated once we got any progress.

Thanks.

H19012 · November 9, 2020, 6:09am

Is the JPEG decoding and the color conversion to RGBA fixed? I am having the same issues with accuracy.

AastaLLL · November 10, 2020, 5:03am

Hi,

Not yet.
Will update here for any progress.

Thanks.

xtianhb.glb · May 28, 2021, 6:01pm

Hi!
I am having a similar problem. Is there any update on configuration for binary classifiers?
(In my case I am using Triton)

xtianhb.glb · May 28, 2021, 7:18pm

If it is useful for anyone, I solved my issue by tweaking the scale_factor in the normalize section.

AastaLLL · June 3, 2021, 8:49am

Hi,

The preprocessing in Deepstream is y=net-scale-factor*(x-mean).
Please map the net-scale-factor value based on the preprocessing used in your training framework.

For example, the equation in PyTorch is y=(x-mean)/std.
So please set net-scale-factor=1/std with PyTorch model.

However, this root cause of this origin issue is related to JPEG decoding.
We are still working on it internally.

Thanks.

trild-vietnam · March 14, 2022, 3:21am

Hello @AastaLLL

Any update on this?
I get the same problem when using image jpeg for input

AastaLLL · March 30, 2022, 5:16am

Hi, trild-vietnam

Would you mind filing a new topic for your issue?
Since we already have some fixes in the latest Deepstream package, you might not meet the same as this topic.

Thanks.

Topic		Replies	Views
How adapt Tensorflow object detection for custom dataset to Deepstream 5.0 DeepStream SDK tensorflow	17	1978	July 27, 2021
Can't configure DeepStream classifier to give the same softmax outputs as the TRT engine it builds DeepStream SDK deepstream , config	24	950	January 4, 2024
How to get `nvinfer` to be as accurate as TensorRT's API? DeepStream SDK tensorrt , tensorflow , gstreamer , nvbugs , python , deepstream	25	232	November 19, 2024
Classifier result on onnx doesn't match Deepstream result DeepStream SDK tensorrt , tensorflow , nvbugs , onnx	35	3302	October 2, 2021
Problems with SSD Mobilenet v2 UFF Jetson Nano ssd	35	7940	October 18, 2021
Convert SSD-Mobilenet to UFF Jetson Nano	13	1829	October 14, 2021
how to write config.py for converting ssd-mobilenetv2 to uff format Jetson Nano	19	6880	October 14, 2021
Deploy TRT Object Detection model (Mobilenetv2) with Deepstream Error:"Failed to parse bboxes" DeepStream SDK tensorrt	3	1502	October 12, 2021
How to convert SSD mobilenet v2 to uff,Then use uff in jetson_inference detectnet_camera script？ Jetson Nano	13	2172	October 14, 2021
Deepstream doesn't give expected Mask-RCNN output DeepStream SDK	26	2998	February 8, 2022

Bad results running uff classifier (mobilenet) with deepstream

Related topics