Vision Transformer or other Temporal Vision Models

Please provide complete information as applicable to your setup.

**• Orin AGX 64GB
**• 7 or 6.2

Hi!

I’m wondering what the best way would be to have spatial-temporal models, such as RNNs or vision transformers that can aggregate information over a long(er) time duration inside Deepstream? I want to go away from static-image object detectors. I know of this, but it seems outdated and I’m not sure how easy it is to adapt: DeepStream 3D Action Recognition App — DeepStream 6.2 Release documentation (nvidia.com)

Is there any reference or such?

Thanks!

The DS 3D action sample is available in the latest DeepStream 7.0.

Thank you, does that mean this is the only example available in the direction of RNNs/Temporal models/…?

Pose Classification | NVIDIA NGC can be another sample.

deepstream_tao_apps/apps/tao_others/deepstream-pose-classification at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.