How to run LSTM model on multiple frames with Deepstream?

nicholaskinnaird · February 23, 2024, 8:41pm

Hi there, I currently have a single-frame classification model in a Deepstream pipeline. The model performs okay but not as well as I’d like it to. A potential improvement to the project is to turn this single-frame classification model into a multi-frame LSTM classification model, such that the input to the model is some number of consecutive (or possibly separated but still ordered) set of frames, and the output is the classification.

(To be clear I don’t have a trained LSTM model at this time, as I’m trying to evaluate different approaches to this problem and if getting an LSTM to work in Deepstream is prohibitive I might try something else.)

I understand how to use the nvstreammux and nvinfer plugins to perform inference on a single frame, but I’m not sure how to batch the frames such that all frames in the batch are used as one input into this hypothetical LSTM model. I see references to LSTM stuff in the nvinfer-server plugin, but I don’t quite follow what is written and whether it applies to what I’m trying to do. I’ve looked through the forums and have found some similar questions but not quite what I’m looking for, or at least I didn’t understand it. It’s very possible I’ve missed an example somewhere and if so please let me know.

Thanks!

fanzh · February 27, 2024, 9:12am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)

• DeepStream Version

• JetPack Version (valid for Jetson only)

• TensorRT Version

• NVIDIA GPU Driver Version (valid for GPU only)

• Issue Type( questions, new requirements, bugs)

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

nicholaskinnaird · February 27, 2024, 5:31pm

• Hardware Platform (Jetson / GPU)
GPU for testing implementation, Xavier NX Jetson for final deployment.

• DeepStream Version
6.3 on GPU, 6.0 on Jetson (Will probably try to upgrade Jetson Deepstream version later on.)

• JetPack Version (valid for Jetson only)
4.6 (for now)

• TensorRT Version
8.0.1.6 (on Jetson, probably higher on GPU)

• NVIDIA GPU Driver Version (valid for GPU only)
535.113.01

• Issue Type( questions, new requirements, bugs)
Questions

I’m okay to get this working on GPU Deepstream 6.3 if there is an answer in that case, and worry about the Jetson deployment later.

fanzh · February 28, 2024, 12:29am

please refer to the nvinferserver introduction. especially the LSTM section lstm-loop.

nicholaskinnaird · March 4, 2024, 4:36pm

Hi @fanzh , I’ve looked through that section, and spent some time looking through the referenced fasterRCNN app, but I’m just not sure I follow or if it’s applicable to what I’m trying to do. Ultimately my model still has a single input layer, that layer being a set of stacked frames as opposed to a singular frame. (While my model is an LSTM, it doesn’t need or want this “loop support” referenced.") It does seem like maybe the functionality in that example can be adapted to what I want to do, possibly by storing incoming frames from nvstreammux as the “extra input tensors” to the model, but it certainly doesn’t appear straightforward if so.

Are you aware of any other examples of what I am trying to do, or something possibly at least somewhat similar? I’ve also thought about trying to take the nvinfer plugin and modify it to handle stacking batched frames before a singular inference call, which might work, but would be tricky I imagine.

fanzh · March 5, 2024, 3:23am

If the model is LSTM based, and next frame’s inputs are generated by previous frame’s output data. if you model only needs some frames as input, please refer to \opt\nvidia\deepstream\deepstream-6.4\sources\apps\sample_apps\deepstream-3d-action-recognition . deepstream-3d-action-recognition is an example to demonstrate a sequence based 3D or 2D model inference pipeline for action recognition.

nicholaskinnaird · March 5, 2024, 4:43pm

Thanks @fanzh , this looks like exactly what I was looking for, appreciate the help.

system · March 20, 2024, 5:22am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Support for Multi-Input Models and Custom Buffer Injection in DeepStream plugins DeepStream SDK jetson , deepstream	7	50	May 27, 2025
How to accelerate single stream pipeline with batch size grater then 1 DeepStream SDK	15	1035	December 26, 2022
Create batch of frames for a single file stream DeepStream SDK tensorrt , gstreamer	6	1183	October 12, 2021
Deepstream-imagedata-multistream only single stream working DeepStream SDK	2	158	April 10, 2024
Two rgb inputs to a duel encoder semantic segmetation model DeepStream SDK	5	358	July 26, 2022
DeepStream Parallel Pipeline and Frame Synchronization DeepStream SDK nvbugs , deepstream	3	95	March 5, 2025
Custom model in DeepStream 5.1 DeepStream SDK	6	792	October 2, 2021
Dynamic Management of Video Sources and nvinfer Plugins for Multi-Model Inference DeepStream SDK	8	339	March 1, 2024
Parallel execution of branches DeepStream SDK	8	981	October 12, 2021
Input frame format for slowfastNet DeepStream SDK	12	1178	October 12, 2021

How to run LSTM model on multiple frames with Deepstream?

Related topics