DeepStream retail application

Could anyone point out to a most straightforward way to implement a detection and labeling and calculation of bottles with deepstream, in case that sounds as something for that deepstream will be a good fit at all,please?
Otherwise, provide arguments why it could be not a very good idea, please.
Any feedback will be appreciated.

Do you want to run in face detection, car detection, license plate recognition, or other fields?
The deepstream model can be specific to one application.

Thank you for your response.
The interest was due to the demand in recognition of type of bottles [with soda] and their types and quantity, and if it could be processed with deep-stream.
Moreover, since I got access to the transfer learning toolkit I will be researching other options. My interest include but is not be limited to the face detection, car detection, plate recognition. However, the reason why I started the thread is because I was asked to sort out a proof-of-concept for soda bottles tracking/ detection. And the forum seems a good place for brainstorm activity.


There are two kinds of model in the deepstream: detection and classification.
For example, if you want to output car type:
[i]>> primary model will give you the location of each car.

secondary model will tell you which car type the detected box is.[/i]

For your use-case, it’s recommended to train a bottle detector first and combine with bottle type classifier.
Please check this tutorial for how to retrain a model:


Thank you for your response.
Upon my investigation it turned out that the first step will be to prepare a kitti dataset input.

and face recognition?
will be the most straightforward direction for the face recognition?


You can give it a try.
If you want a face detection sample, DetectNet should be enough:



a friend of mine asks how will xavier fit a need to process object detection.

How AGX Xavier will perform it in case they have 6 cameras,
and will be looking for running multiple models for detection with it?

It is difficult for me to answer the question,
because what I could see so far is either that examples for 4 and 30 sources work pretty fast with all default settings and inputs from file.

Or, on the other hand I used to direct a rtsp gstreamer stream of one xavier to another xavier’s deepstream application with high resolution and 30 fps, and it reduced performance of processing the single thread to less than 10 fps, as far as I remember. However, I did not look too close into details and configuration files parameters.
To sum all up: my question that is addressed to more knowledgeable guys is:

Will Xavier will fit for 6 cameras detection with multiple models?

However, what I can do Is to approach stream aggregation of 4 cameras with use of gstreamer, rtsp and direct CSI. That should allow to estimate approximately probable performance with use of default model at least