ActionRecognitionNet as SGIE for multiple person action recognition

• Hardware Platform: Jetson Orin
• DeepStream Version: 6.3
• JetPack Version: 5.1.2-b104
• TensorRT Version: 8.5.2.1

I’m trying to perform action recognition using DeepStream with Python. I ran the deepstream-3d-action-recognition application from the sample_apps directory using Python bindings. The pipeline runs as streammux-preprocess-pgie (ActionRecognitionNet), but this application works for a single object. I want to run the ActionRecognitionNet model as an SGIE to perform inference for each detected object. The new pipeline is as follows:
Streammux - PGIE - Tracker - Preprocess - SGIE

The pipeline runs, but the SGIE doesn’t perform any inference. The configurations are as follows:

PGIE:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
tlt-encoded-model=peoplenet/resnet34_peoplenet_int8.etlt
labelfile-path=peoplenet/labels_peoplenet.txt
model-engine-file=peoplenet/resnet34_peoplenet_int8.etlt_b2_gpu0_int8.engine
int8-calib-file=peoplenet/resnet34_peoplenet_int8.txt
input-dims=3;544;960;0
uff-input-blob-name=input_1
batch-size=2
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=3
cluster-mode=2
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
#input-tensor-from-meta=1
output-tensor-meta=1

[class-attrs-all]
topk=20
nms-iou-threshold=0.5
pre-cluster-threshold=0.2

Preprocess:

[property]
enable=1
target-unique-ids=3
operate-on-gie-id=1
network-input-order=0
process-on-frame=0
unique-id=2
gpu-id=0
maintain-aspect-ratio=0
symmetric-padding=0
processing-width=224
processing-height=224
scaling-buf-pool-size=6
tensor-buf-pool-size=6
network-input-shape= 4;3;224;224
network-color-format=1
tensor-data-type=0
tensor-name=input_1
scaling-pool-memory-type=0
scaling-pool-compute-hw=1
scaling-filter=0
custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/gst-plugins/libcustom2d_preprocess.so
custom-tensor-preparation-function=CustomTensorPreparation

[user-configs]
pixel-normalization-factor=1

[group-0]
src-ids=0
operate-on-class-ids=-1
custom-input-transformation-function=CustomAsyncTransformation
process-on-all-objects=1
process-on-roi=0
input-object-min-width=100
input-object-min-height=100
input-object-max-width=500
input-object-max-height=500

SGIE:

[property]
gpu-id=0
tlt-encoded-model=./resnet18_2d_rgb_hmdb5_32.etlt
tlt-model-key=nvidia_tao
model-engine-file=./resnet18_2d_rgb_hmdb5_32.etlt_b4_gpu0_fp32.engine
force-implicit-batch-dim=1
labelfile-path=labels.txt
batch-size=4
process-mode=2
network-mode=0
gie-unique-id=3
network-type=100
operate-on-gie-id=2
input-object-min-width=64
input-object-min-height=64
model-color-format=1
classifier-async-mode=1
input-tensor-from-meta=1
output-tensor-meta=1
tensor-meta-pool-size=8
num-detected-classes=5

Is the ActionRecognitionNet model not working as an SGIE? Can you help me?

Your scenario is similar to our sample below deepstream-pose-classification. You can refer to this first. And could you also attach your video source so that we can run that with our original deepstream-3d-action-recognition first.

Thank you for your reply.

I had already run the original 3d-action-recognition without errors. But there the ActionRecognitionNet model is running as PGIE. I want to run PeopleNet as PGIE and ActionRecognition model as SGIE. PeopleNet does inference but ActionRecognitionNet does not do any inference. Does the ActionRecognitionNet model only work as PGIE? Does it not work as SGIE?

streammux.link(preprocess)
preprocess.link(pgie)
pgie.link(queue1)
queue1.link(tracker)
tracker.link(queue2)
queue2.link(preprocess_sgie)
preprocess_sgie.link(queue3)
queue3.link(sgie1)
sgie1.link(tiler)
tiler.link(queue4)
queue4.link(nvvidconv)
nvvidconv.link(queue5)
queue5.link(nvosd)
nvosd.link(queue6)
queue6.link(nvvidconv_postosd)
nvvidconv_postosd.link(queue7)
queue7.link(caps)
caps.link(queue8)
queue8.link(encoder)
encoder.link(queue9)
queue9.link(rtppay)
rtppay.link(queue10)
queue10.link(sink)

It can work as SGIE. But it may not be suitable for your scenario. It can only detect 1 people or multiple people with the same action.
If you use an peoplenet as a pgie, the original images cached in preprocess are all of different sizes and backgrounds. This model does not work in this scenario.

Got it, thank you. Then I need to follow a different way to recognize the action of more than one person, right? I will take the pose-classification application you mentioned as a reference. I need to do the Python (pyds) conversion and develop. Do you have any advice on this?

You mainly need to refer to the preprocess of how to batch each object detected by the pgie and implement your own preprocess algorithm.

Thank you for your support. Good work!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.