I am trying to use an opensource framework (https://cmusatyalab.github.io/openface/), using computer vision and neural nets, on the Jetson TX2. I wrote Python code which makes use of this framework. My current implementation on my laptop’s CPU, needs between 3.679sec and 4.805sec to provide a result, while on the Jetson the same code needs around 43 seconds!
I tried following the steps suggested on this link, ie using Anaconda: https://developer.nvidia.com/how-to-cuda-python . But it seems like the Jetson’s architecture isn’t supported. When running the .sh file I downloaded from anacondas website, I get:
"cannot execute binary file: Exec format error. ERROR: cannot execute native linux-64 binary, output from ‘uname -a’ is:
Linux tegra-ubuntu 4.4.15-tegra #1 SMP PREEMPT Wed Feb 8 18:06:32 PST 2017 aarch64 GNU/Linux"
No matter whether I use the 32 or 64 bit. The issue stays the same.
I heard there were some CUDA compilers/profilers available that allow some sort of significant “automatic” optimization without me having to rewrite my code, if I understood it correctly.
What solutions are there for me in order to speed up my python code without necessarily having to rewrite everything?
On a tangent, I am using Python2.7.12