3D action recognition

• Network Type (action recognition)
TAO’s 3D action recognition can’t recognise individual’s person’s activity, right?
For example, one person is doing running and another person is doing biking in a short clip, 3D action recognition algorithm can’t recognise individual activity.
If I like to recognise individual activity, do you have different algorithm?

The action recognition model in TAO does not do detection.
It just classify the input sequence.
So if multiple person in one image doing the same action, it might work.
If multiple person in one image are doing multiple action, it can only give one classification and it might be not correct.

It’s also what I have in mind.

What if I first detect all the persons in sight and apply action recognition on their image sequences, respectively?
That means, if N people are detected, the action recognitoin needs to be done N times in order to tell their action statuses respectively.

Of course, the training data has to be modified. I don’t really want a image sequence containing multiple persons taking different actoins like some of the samples in the HMDB. Instead, each image sequence contains only one person taking one action.

I’ve been trying to do this but I’m not sure if this make sense, as it seems that the recognition task is mostly done using skeleton keypoints, which ActionRecognition doesn’t seem to include.

It makes sense. Similar pipeline can be found in Pose classification network.
Refer to

1 Like

@Morganh You mean we apply human detection model first to detect individual person in bounding box.
Then those related bounding boxes in series of images (for example 32 images), are applied to action recognition model, is it? So that we have action recognition for each person in video. Is it possible in deepstream? If we can do, it is fantastic, we can even change model, that means we can train our own model (for example AVA action detection) and convert onnx and apply.
So my question is in deepstream, how we can have series of bounding boxes for 32 images (for one segment of short clip for action recognition) and apply to model? Which plugin we need to look at and modify?

@silentjcr Yes you are right, we train individually. But can apply to group of people in image in deployment. I normally train only one person in video during training.

@Morganh Can we apply the model poseclassification net in deepstream?

@Morganh If we run on normal GPU like RTX3080 or RTX4070, how is fps?
If run on AGX Xavier, how is FPS? I can’t find benchmarking.

Currently, in deepstream, there is already poseclassification. It will detect each person and their pose along with its classification.
Refer to https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_others/deepstream-pose-classification

For FPS, please find in ngc model card. For example, Action Recognition Net | NVIDIA NGC


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.