JETSON takes triple time than X86 (i3)

julsi_nagar · November 2, 2017, 6:43am

Hi there,

I started to work with Jetson TX1 and I found some problems with execution time. I investigated it in more details. I wrote the python code based on image detection. For this python script CPU execution time and GPU execution time remains same. Also this code work fast in my X86 (i3) machine.
JETSON - 67 sec
i3 - 23 sec

My python script steps - 1.cropped image(20 sec), 2.load trained model(10 sec), 3.Detect object(37 sec).

So can you tell me, Why NVIDIA JETSON TX1 takes triple time compared with normal X86 (i3) PC for python script? also this script takes same time whether it will run on CPU or GPU in JETSON ?

linuxdev · November 3, 2017, 12:24am

The CPUs on a mobile device are always slower than a PC CPU. In the case of something using GPU processing you can expect there to be close to 10x as many CUDA cores on a PC as on a Jetson. The trick is that a desktop PC would drain a battery in an extremely short time, while a Jetson could do this all day long from the same battery. Compared to any of the other mobile chipsets the Jetson would be much much faster when GPU is used with data suited for the GPU.

The topic of speed with and without a GPU depends on many things. One of those would be tuning to use the right number of concurrent kernels which are using the correct size of data…you have to tune the code for specific use cases. Fastest speed for a PC will differ in how data is divided and submitted for processing compared to the same data being submitted to a Jetson. You also have to select how data is copied back and forth on a Jetson differently than how you achieve this on a PC…the PC has dedicated GPU memory over PCIe, the Jetson GPU ties directly to a memory controller using slower system RAM…there are implications for performance and caching depending on how memory access is implemented.

You might use the tegrastats program in the home directory of your nvidia or ubuntu user on the Jetson to see what resources are actually being used. For anyone to make any specific comment you’d have to give a lot more detail on the data and operations used along with how many concurrent GPU kernels are being called.

AastaLLL · November 3, 2017, 3:44am

Hi,

Use GPU require users to write code with CUDA.
If you are in python language, please install pyCUDA first and convert the instruction into pyCUDA.

If not, the results are the comparison between i3 and A57.
A desktop level CPU vs. an embedded level CPU.

Jetson is proud of its GPU computing power.
It’s recommended to implement your use-case with GPU to get the best performance.

Thanks.

Topic		Replies	Views
Running PyTorch CUDA Jetson Nano pytorch	8	2055	July 13, 2022
CUDA Kernel runs much slower on TX1 than on discrete GPU Jetson TX1	8	2494	March 2, 2016
Pycuda runs super slow on Jetson Xavier NX compared to running on CPU Jetson Xavier NX pycuda	8	1841	October 18, 2021
CPU timing on Jetson TK1 Jetson TK1	7	1144	October 18, 2021
Jetson TX2 inference time Jetson TX2	3	845	July 30, 2018
Jetson for pyrit / cpyrit-cuda stats Jetson TK1	0	2103	October 27, 2014
Running Multiple DetectNets on Jetson TX1 Jetson TX1	2	872	October 18, 2021
I cannot use GPU well Jetson TX2	4	546	October 18, 2021
Running with GPU is slower than CPU on TX2 Jetson TX2	2	736	October 18, 2021
Jetson nano CPU and GPU sample program Jetson Nano	4	2924	October 14, 2021

JETSON takes triple time than X86 (i3)

Related topics