Building a guide to implement multi step inference models

Hi!
I’m working in write a simple guide to people like me who bought a Jetson board (AGX Xavier in my case) and don’t know exactly how to implement in deepstream 5.0 some out of the box pretrained models.
I’ll add some details about my actual setup and then I continue with the idea.
At the moment I’m using:
• Hardware Platform: Jetson AGX Xavier
• DeepStream 5.0
• JetPack 4.4
• TensorRT 7.1
This was the latest default configuration to install in the Xavier from de SDK Manager, so I started from there.
My main problems at this point are:

  • How do I have to start building a deepstream app? In example, is there any guide or tutorial to see or read about how to customize the default app? I where playing with it and I have some good results implementing the basics of the default samples.
  • How do I have to train
  • Is there any guide or can anyone explain me how can I use the peoplenet and train with custom dataset? Because I already have my data with the labels prepared like this guide, but from this point in advance I don’t know exactly how to move on.
  • Is there any way you can guide me or give me a step by step to use, in example 2 YOLOv3 (default from the examples) to detect in cascade? In example, how to use default YOLOv3 to detect every person or dog, then in secondary detector I want to check if the dog is a golden retriever or a corgi / in the case of persons I would like to see in every person if they are using a white jacket. (Of course, in this case the secondary model is trained with custom data like like this sample
    I founded this repo with an example about 2 step inference, but I don’t understand very well how I can modify this to achieve an example like metioned before.
  • Lastly, and not the less important, is there any good strategy to migrate our models to TensorRT and how I can check the performance improvement?

I hope you can help me, sorry for my english and I understand there are a few post about this subjects, I readed some of thems, what I’m looking here is some advicining and guidance to make some documentation to other how’s starting with this and how to make it easier.
Thanks in advance for your time.
All the best,

Franco.

How do I have to start building a deepstream app? In example, is there any guide or tutorial to see or read about how to customize the default app? I where playing with it and I have some good results implementing the basics of the default samples.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_app_architecture.html#
" Sample Application Source Details" section in https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html#page/DeepStream_Development_Guide/deepstream_quick_start.html#wwpID0E03C0HA

How do I have to train

If you want to modify the model, you need to re-train

Is there any guide or can anyone explain me how can I use the peoplenet and train with custom dataset? Because I already have my data with the labels prepared like this guide, but from this point in advance I don’t know exactly how to move on.

you can start with doc in https://developer.nvidia.com/transfer-learning-toolkit

Is there any way you can guide me or give me a step by step to use, in example 2 YOLOv3 (default from the examples) to detect in cascade?

yes, you can refer to the back to back sample

Lastly, and not the less important, is there any good strategy to migrate our models to TensorRT and how I can check the performance improvement?

TensorRT supports ONNX, caffe, UFF models, you can transfer your model to one of these model (ONNX is recommened), then you can use trtexec tool provided in TRT package to profile the perf of your model.

1 Like

thanks!
I’ll take a look.
Just for in case I ask now, if I use the example of peoplenet and add a secondary detector, it should work out of the box if I ask to it to see in the detected person if it wear a hat or it could need some more work?
All the best

Sorry! I don’t capture your point. Could you elaborate why it may not work?

I ask if I can add a secondary detector to the PeopleNet example.
The idea is the next:
I want to see where the people is in the frame, then in this detections I want to see if this person wear a green hat.
I’m looking for some help to make this work. At this point I figured out how to make the PeopleNet example run in Xavier, and it goes smooth. But I’m not that good to add the secondary detector yet.
Thanks in advance