Which is the best deep-learning framework to implement activity recognition (video processing) on J...

I am a beginner in deep learning. Objective is to implement an application based on Activity Recognition on Jetson TX1 module. I found out Caffe, Torch, Theano and Tensorflow are the most widely used deep-learning frameworks currently. planning to use the inbuilt TensorRT package on Jetson TX1 for inference.

Right now i am confused with the selection of framework. Tensorflow is growing rapidly but i am concerned about its stability issues at each version release. I understand it is supported by Google and will have great importance in future. Caffe is mainly focussing on image processing applications. Even though many applicatiions are implemented in Jetson TX1 using caffe, latest forums are talking more about torch and theano as better options. Theano with Keras is also a good option but debugging takes a lot of time and it is difficult to find out the problem. Torch can be debugged easily and has good support but Lua language is completely new for me. I also need to decide which nueral network - CNN or RNN suits best for me.

Considering all the above factors above ,please suggest the best framework that can be selected.

Hi viv_27, for your application of Activity Recognition, I think it depends on the neural network architecture that you choose to use. Have you identified any networks for achieving this yet? A quick search on GitHub turned these up:


These branches of Caffe use prototxt-style networks for completing the Activity Recognition task. TensorRT is also able to process prototxt networks and provide optimizations. So you can start with Caffe, make sure it’s running and the network works, and then run the prototxt through TensorRT.

For efficiently deploying DNN to embedded Jetson TX1 platform, I recommend using TensorRT, Caffe, or Torch. Torch has low overhead on embedded unlike Tensorflow/Keras. Of course, first I recommend doing research in the literature and GitHub on your particular task (Activity Recognition), and then work backwards from there.