Different inference accuracy when using csi camera and video file as source in Deepstream

• Hardware Platform (Jetson / GPU) : Jetson Nano
• DeepStream Version : 5.1
• Jetpack version : 4.5.1
• Issue Type(questions, new requirements, bugs) : Question


We trained a EfficientnetB1 model with TAO and we are using it as primary detector classifier to analyze video recorded with a CSI camera connected to a Jetson Nano (source0 type=5).
In that process we save 2 things:

  1. Frames coming from the captured video as jpg files.
  2. Prediction obtained in customized deepstream app for each frame.

Later, we generated a video file (.avi) with those frames and the resulting video was introduced into the customized deepstream app to do inference. But this time, instead of using a csi camera, we set a video file as source (source0 type=2).

Susprisingly, analyzing a video file source gets significantly better inference results than analyzing video captured directly from csi camera.

Do you know what could be the cause for this behaviour?

Thanks in advance.

I don’t understand what is this.

The two sources are different.

When I said “customized deepstream app” I was referring to a deepstream-app we modified to access classifier metadata. With that modification we save the prediction for each frame so that we can process that information later. The problem we have is that deepstream returned a specific prediction for frames captured by csi camera, and after generating a video file (.avi) with those exact frames Deepstream returned a different prediction.

Why different sources can make Deepstream obtain different predictions for same images?

What does “prediction” mean?

They are different data. I don’t know how do you convert the images into video, but they can not be the same data.

With “prediction” I refer to the class that deepstream returns for analyzed frame. For example, for frame 0 deepstream returns that it is an apple with 0.8 probability.

We generated the video using this gstreamer command:

gst-launch-1.0 multifilesrc location="/tmp/%d.jpg" caps=“image/jpeg,framerate=30/1” ! jpegdec ! x264enc ! avimux ! filesink location=“out.avi”

As the video is created with the same frames got from the camera, we thought that deepstream should get same results/predictions in both cases (video as source and csi camera as source).

x264enc means there are data compression with AVC algorithm. H.264 : Advanced video coding for generic audiovisual services AVC encoding is not a lossless algorithm.

I understand it, but in that case it would make sense that the performance in the analysis of the .avi video is worse than in the analysis of the CSI camera video. However, performance is better.

To summarize:

  1. We have a CSI camera generated video that is analyzed using Deepstream (analysis 1).
  2. When the video is analyzed we generate one image per frame.
  3. A new video is generated in .avi format using those generated images.
  4. We analyze the new video (.avi) using Deepstream (analysis 2).

We think it does not make sense that analysis 2 has a better performance than analysis 1 because, according to your last response, as there is data compression, the performance of analysis 2 should be worse.

Does somebody know why this behaviour happens?

What does your performance mean? The accuracy of the bbox?

With “performance” we refer to the accuraccy of the inference. If prediction for each frame is correct or not.