Parallel execution of branches

• Hardware Platform: Jetson
• DeepStream Version: 5.0
• JetPack Version (valid for Jetson only): 4.4
• TensorRT Version: 7.0
• Issue Type: Questions

Hi,

I’m currently developing an IVA application for the Jetson.
I want to use deepstream to fully utilize undelying hardware - the application will consist of few CV tasks - some of them are independent from each other.

  1. I want to split the pipeline into branches that can be processed independently, but in the end i want the results to be assigned to the corresponding frame
  2. Not every frame must be inferred - if pipeline is overloaded the older frames may be dropped
  3. I’d like to draw the inference results and expose them on output rtsp stream

The main flow:

[RTSP] ----> [Detector] ---> [Tracker] ---> [First classifier]
        |                               +-> [Additional processing(this will push downstream different buffer)] ---> [Pose estimation] ---> [Classifier]
        |                                                                                                        +-> [Cascade detector] ---> [Classifier]
        +-->[Scene classifier]

*Each component will produce metatata basically in custom format (every single one will be operating in place)
** Each component will be working with batches that comes from multiple camera streams

I was looking on the nvinfer component and I’ve seen that it has option to infer classifier in asynchronous mode. I’m wondering if it would fit this use case - if there is an async inference how I can ensure that every frame that inference process is done for every frame that goes into the output pipeline?

Could you please advise how I should build the workload using deepstream to ensure concurrent executuion of independent tasks?

Thanks

1 Like

The inference result(for any model such as detector or classifier or segmetation…) is output to downstream as DsMetaData which is attached to GstBuffer for the frames.

The definition of ‘classifier-async-mode’ of nvinfer is in https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html#gst-nvinfer-file-configuration-specifications

I don’t understand the pipeline you list. Which of them are inference models? Is ‘[Additional processing(this will push downstream different buffer)]’ a inference too? How many inference models are there in your pipeline? What are the input and output of the models? What are the relationship between these models? Have you evaluated the performance of the models on Jetson before you use them with DeepStream?

Thank you for the response!

Which of them are inference models?
How many inference models are there in your pipeline?
What are the relationship between these models?

So there are 4 main (independent - but they are using the same detector and additional processing step) tasks for this pipeline. I have added a numbers to indicate which of them are unique.

  1. Scene classifier (1) - which takes whole frame and attach classification results
  2. Detector (2) -> Classifier (3)
  3. Detector (2) -> Additional processing -> Pose Estimation (4)-> Classifier (5)
  4. Detector (2) -> Additional processing -> Face Detector (6)-> Classifier (7)

There are 7 different inference models.

Is ‘[Additional processing(this will push downstream different buffer)]’ a inference too?

Additional processing is a step that produces new frames that would be consumed by Pose Estimation and Face Detector. This step does need to produce new image (that will not be displayed anywhere - only used for inference, so I also consider to attach this produced image as a custom metadata). It’s not a inference step, but its required by Pose Estimation (4) and Face Detector (6).

What are the input and output of the models?

  • Scene classifier (1) - takes a RGB frame and should return probabilities for each class as well as some raw data from specified tensors (this step output will be a custom metadata)
  • Detector (2) - standard detector input / output
  • Classifier (3) - standard classification of detected objects (people only)
  • Pose Estimation (4) - takes a RGB frame and returns list of keypoints for each detected skeleton
  • Classifier (5) - takes a sequence of skeletons and returns standard classification output
  • Face Detector (6) - takes a RGB frame and return position of detected faces
  • Classifier (7) - standard classification of faces

Have you evaluated the performance of the models on Jetson before you use them with DeepStream?

Yes, we have run the performance tests (we’ve run the benchmarks for models in TensorRT with Jetson AGX in MAXN mode. The slowest one from our workload achieved 60FPS with batch size = 1 (Final solution will introduce batching to improve this result).

The 7 models can be used in the same pipeline, you can refer to the sample codes of /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test2 which use 4 models, one model to detect cars and persons, three classifiers to identify the car’s color, type and manufacturers.

The only problem is what is your ‘Additional processing’? Is it the pre-processing for ‘Pose estimation’ and ‘face detector’? If so, what kind of pre-processing is needed? Scaling, color format conversion, normalization or any other pre-processing?

The 7 models can be used in the same pipeline, you can refer to the sample codes of /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test2 which use 4 models, one model to detect cars and persons, three classifiers to identify the car’s color, type and manufacturers.

Thanks for the sample, but I’m afraid that would not fit to our case - we would like to execute in parallel whole branch - This example shows only how to run one model asynchronously.

The only problem is what is your ‘Additional processing’? Is it the pre-processing for ‘Pose estimation’ and ‘face detector’? If so, what kind of pre-processing is needed? Scaling, color format conversion, normalization or any other pre-processing?

‘Additional processing’ step consists of several steps - it will be custom developed.
During this stage we will produce new image that we are planning to attach as a buffer meta data. The new image will be created based on specified parts of the original frame - this is our project requirement.

Why do you say so?

Why do you insert the new images into meta data? How will you use the meta data in downstream? Will these images be used as the input of ‘Pose Estimation’ and ‘Face Detector’?

OK, so maybe I misunderstood something :) I will give it a try.

For PoseEstimation we still need to develop custom input/output parsers for our models like in apps/sample_apps/deepstream-infer-tensor-meta-test sample and provide for nvinfer functions to correctly parse input/output tensors.

So it is just a part of inference, you don’t need to list it as a separated step in your deepstream pipeline.