Machine Learning with Video Data to identify actions

Hi all,

I’m hoping this very simple generic question doesn’t annoy the forum but it would help me understand how I need to apply nvidia and GPU programming to a project.

Would someone be able to explain the difference in machine learning on static images and trying to ML on videos of actions?

By that I mean labelled still images are fed through a NN and every image shows the same object to be detected (dog, cat, face) but maybe in a different pose or setting and using back prop the neural network weights are adjusted until its able to identify the object.

If you have video as your source and you need to identify patterns how do you process it? If you break it up into individual frames then the action isn’t necessarily captured in 1 frame/image or does that matter

Does the NN just learn what the end of the pattern looks like and look for that split second in the video stream?

You might want to study recurrent neural networks.

True but any insight into the principles of processing video for neural networks and using video data in machine learning