Caffe and Imagenet

Hi all,
i’m testing the Nvidia Jetson TK1 and reading this post http://devblogs.nvidia.com/parallelforall/embedded-machine-learning-cudnn-deep-neural-network-library-jetson-tk1/ it’s specified that the Jetson with caffe and cuDNN is able to process (i guess they mean classify) an image in just 34 ms, however writing the script for the classification in python the performance i got are worse than this… In particular processing one single image I’ve an execution time of 0.6 sec and processing 15 images in parallel I’v an average execution time per image of 0.08 sec, so even running the classification in parallel I’m not able to achieve the specified performance.
I’v installed the last L4T, caffe and cuDNN from the relative repositories, do you have any suggestion about this strange behavior or i’m missing something?

Thank you all

Have you maximised CPU, GPU and EMC clocks?

More about setting those can be found from the wiki:
http://elinux.org/Jetson/Performance

Here’s a blog entry for setup of Caffe on the Jetson which discusses the steps to recreate their results:

and with cuDNN:

Note that Python is not used in the examples. The blog posts include step by step video instruction.

I corrected what I’ve done, now caffè shows the rxpected results, however python is far a way from this, I’m wondering if there this is due by the fact that the test done repeatetly during each iteration and so what we see is a cached results that is not what you get during a normal execution.