Please provide complete information as applicable to your setup.
**• Orin AGX 64GB
**• 7 or 6.2
Hi!
I’m wondering what the best way would be to have spatial-temporal models, such as RNNs or vision transformers that can aggregate information over a long(er) time duration inside Deepstream? I want to go away from static-image object detectors. I know of this, but it seems outdated and I’m not sure how easy it is to adapt: DeepStream 3D Action Recognition App — DeepStream 6.2 Release documentation (nvidia.com)
Is there any reference or such?
Thanks!