JETSON takes triple time than X86 (i3)

Hi there,

I started to work with Jetson TX1 and I found some problems with execution time. I investigated it in more details. I wrote the python code based on image detection. For this python script CPU execution time and GPU execution time remains same. Also this code work fast in my X86 (i3) machine.
JETSON - 67 sec
i3 - 23 sec

My python script steps - 1.cropped image(20 sec), 2.load trained model(10 sec), 3.Detect object(37 sec).

So can you tell me, Why NVIDIA JETSON TX1 takes triple time compared with normal X86 (i3) PC for python script? also this script takes same time whether it will run on CPU or GPU in JETSON ?

The CPUs on a mobile device are always slower than a PC CPU. In the case of something using GPU processing you can expect there to be close to 10x as many CUDA cores on a PC as on a Jetson. The trick is that a desktop PC would drain a battery in an extremely short time, while a Jetson could do this all day long from the same battery. Compared to any of the other mobile chipsets the Jetson would be much much faster when GPU is used with data suited for the GPU.

The topic of speed with and without a GPU depends on many things. One of those would be tuning to use the right number of concurrent kernels which are using the correct size of data…you have to tune the code for specific use cases. Fastest speed for a PC will differ in how data is divided and submitted for processing compared to the same data being submitted to a Jetson. You also have to select how data is copied back and forth on a Jetson differently than how you achieve this on a PC…the PC has dedicated GPU memory over PCIe, the Jetson GPU ties directly to a memory controller using slower system RAM…there are implications for performance and caching depending on how memory access is implemented.

You might use the tegrastats program in the home directory of your nvidia or ubuntu user on the Jetson to see what resources are actually being used. For anyone to make any specific comment you’d have to give a lot more detail on the data and operations used along with how many concurrent GPU kernels are being called.


Use GPU require users to write code with CUDA.
If you are in python language, please install pyCUDA first and convert the instruction into pyCUDA.

If not, the results are the comparison between i3 and A57.
A desktop level CPU vs. an embedded level CPU.

Jetson is proud of its GPU computing power.
It’s recommended to implement your use-case with GPU to get the best performance.