Input frame format for slowfastNet

I am trying to deploy this model (slowfastNet) with mmaction framework,

first question: it’s not classify frame by frame it needs a batch of frames to classify on, how could i do this?
second question: the slowfast model has 2 paths (slow and fast paths) and each path need a specific number of frames from the whole input (for ex if my batch is 64 frames the fast path will need 32 frame only and the slow path will need less “and those frames choosing by a specific skip offset too”, so how could i do this also ?

1 Like

@kayccc @mchi @Fiona.Chen

Hi @aya95,
Is it possible to draw a diagram to illustrate the requirement?

it needs a batch of frames to classify on

Is the batch of frames from the same stream?

Does the model have two inputs for slow and fast respectively?

And, please provide setup info as other topics.


Hello @mchi,
I am looking for input regarding using deepstream to run frames on slowfast network as well.
The diagram to illustrate the requirement could be seen here:

Is the batch of frames from the same stream?

As can be seen from the diagram, the batch of frames are from the same stream, one path with higher framerate (fast path) and one with lower framerate (slow path).

Does the model have two inputs for slow and fast respectively?


The setup for me (and I hope for OP as well) is the nx device running deepstream 5.0, and we want to run this model thru ds.


I’m wondering how to run it with TensorRT.

There are two inputs, so for each inference shot, we need feed data into both two inputs, the question is: for the fast input, we can feed new frame for each inference shot, how about the slow input? feed dummy data or old frame ?

I have another question depend on your reply,
Is this mean that the deepstream could accept to classify batch of frames at once?

if the batch frame from different streams, the answer is YES.
if from the same one stream, the answer is NO.

But, per the info from @user137, I don’t understand why batch is needed.

so we could not deploy a model that has this shape as an input (64, 224, 224, 3) ?
just to make sure ^^

I’m not sure, because you still haven’t answered the question I asked twice above : is the batch from the same stream or difference streams?

yes from same stream,
but about the batch thing i am speaking generally “not on the slowfast only”, if we could make the input shape has a batch of frames from one stream (generally for any input that accepts more than one frame at the input like that one(64, 224, 224, 3))

I just want to clarify what do we mean by batch here. From the figure attached in @user137 reply, away from deepstream, slowFast net relys on the temporal change in the frames, so it needs a number of successive frames as input, so that it can extract the temporal features. That’s what we meant by batch.

Now, if we want to run slowFast with deepstream, can we, from single stream, input say 5 successive frames to the model?. Right now deep stream inputs single frame to the model, can we input N frames?.

With that being said, can we input 2 batches of different frames count, say 5 frames and 30 frames for the fast path and slow path in the slowFast model?


DeepStream nvstreammux does not support batching the frames from one source.

Maybe, there is a work around solution for this (say fastpath 30 frames/batch, slowpath 5 frames/batch)

  1. have a plugin before nvstreammux that seperate the frames from one source into 35 streams (or sources) and send the frames to nvstreammux in sequence (this is important, otherwise, the frames in the batch will be out of order) ===> Sorry! nvstreammux supports to forms batch from one source, so this plugin before nvstreammux is not needed, you just need to set the batch number of both nvstreammux and nvinfer to sum of (frames to slowpath + frames to fastpath) or frames to fastpath (if slowpath can use the frames in fastpath)
  2. set the batch size of nvstreammux and nvinfer to 35
  3. after nvinfer gets the 35 frames, you need to modify the nvinfer code to seperate the 35 frames for the two inputs.