Hi there, I currently have a single-frame classification model in a Deepstream pipeline. The model performs okay but not as well as I’d like it to. A potential improvement to the project is to turn this single-frame classification model into a multi-frame LSTM classification model, such that the input to the model is some number of consecutive (or possibly separated but still ordered) set of frames, and the output is the classification.
(To be clear I don’t have a trained LSTM model at this time, as I’m trying to evaluate different approaches to this problem and if getting an LSTM to work in Deepstream is prohibitive I might try something else.)
I understand how to use the nvstreammux and nvinfer plugins to perform inference on a single frame, but I’m not sure how to batch the frames such that all frames in the batch are used as one input into this hypothetical LSTM model. I see references to LSTM stuff in the nvinfer-server plugin, but I don’t quite follow what is written and whether it applies to what I’m trying to do. I’ve looked through the forums and have found some similar questions but not quite what I’m looking for, or at least I didn’t understand it. It’s very possible I’ve missed an example somewhere and if so please let me know.
Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
Hi @fanzh , I’ve looked through that section, and spent some time looking through the referenced fasterRCNN app, but I’m just not sure I follow or if it’s applicable to what I’m trying to do. Ultimately my model still has a single input layer, that layer being a set of stacked frames as opposed to a singular frame. (While my model is an LSTM, it doesn’t need or want this “loop support” referenced.") It does seem like maybe the functionality in that example can be adapted to what I want to do, possibly by storing incoming frames from nvstreammux as the “extra input tensors” to the model, but it certainly doesn’t appear straightforward if so.
Are you aware of any other examples of what I am trying to do, or something possibly at least somewhat similar? I’ve also thought about trying to take the nvinfer plugin and modify it to handle stacking batched frames before a singular inference call, which might work, but would be tricky I imagine.
If the model is LSTM based, and next frame’s inputs are generated by previous frame’s output data. if you model only needs some frames as input, please refer to \opt\nvidia\deepstream\deepstream-6.4\sources\apps\sample_apps\deepstream-3d-action-recognition . deepstream-3d-action-recognition is an example to demonstrate a sequence based 3D or 2D model inference pipeline for action recognition.