Performance comparision TK1 vs TX1

ManuKlause · November 22, 2016, 2:46pm

Hi all,

I have compared the performance of TK1 vs TX1. I made some image processing using different sizes of images (1920 x 184 pixel, 1920 x 300 pixel, 1920 x 1200 pixel). The results:

TX1

36 ms for 1200 px

12 ms for 300 px

8 ms for 184 px

TK1

41 ms for 1200 px

13 ms for 300 px

9 ms for 184 px

Why is there almost no difference for small images? The warp size is the same. (but I am not using CUDA kernels, I am using OpenCV4Tegra)

jachen · November 25, 2016, 7:14am

Hello,
It’s quite a complicated problem to compare performance.

First, I’m not sure what kind of algorithm you are running. Is GPU acceleration applied?
Generally, for GPU acceleration, there are some extra overheads. So for smaller picture, the processing acceleration may be less obvious than bigger pictures.

For you case, you can also check the system status by ‘tegrastats’. Probably, the system does not run in max state.

br
Chenjian

ManuKlause · November 25, 2016, 7:36am

I am running the gaussian filter enginge and threshold (both via OpenCV with GPU acceleration) and the function findContours (not with GPU). I have edited the rc.local file with [url]https://github.com/yongxu/tx1-max-perf-script/blob/master/max_perf_script.sh[/url]
and a corresponding .sh-file for the TK1.

ManuKlause · November 25, 2016, 7:55am

The result of running tegrastats with TX1:

RAM 1461/3853MB (lfb 363x4MB) cpu [44%,35%,43%,27%]@1734 EMC 25%@1600 AVP 0%@80 GR3D 30%@998 EDP limit 1734

ManuKlause · November 28, 2016, 10:02am

I am using the following frequencies: CPU 1734 MHz, GPU 998,4 MHz

kayccc · December 2, 2016, 3:53am

Hi ManuKlause,

When we benchmark TK1 vs TX1, we have seen some degradation, and some improvement, by deep SW architecture level synchronization.

To confirm if this is the problem; you can check the log for the execution from both and compare the time spent on the GPU. TX1 should be faster, if not, the slowdown is caused by GPU architecture changes, and probably the code is sub-optimal. If you see GPU execution in TX1 is faster, you can improve the pipeline by better use of streaming and synchronization.

Hope this helps on your case.

Thanks

Topic		Replies	Views
Comparing TK1 and TX1 GPU specs with OpenCV4Tegra mog2 algorithm Jetson TX1	4	1027	October 18, 2021
TX1 slower than TK1 Jetson TX1	5	1314	August 19, 2016
Opencv4Tegra GPU vs CPU TK1 vs TX1 Jetson TX1 opencv	3	3669	April 28, 2016
Jetson TK1 performance bottleneck CUDA Programming and Performance	4	2714	February 10, 2016
OpenCV Benchmarks: opencv_perf_gpu Jetson TX1 opencv	2	2655	March 28, 2016
TX1 vs TK1 CPU Jetson TX1	7	21023	December 17, 2015
confused,Our programs run on TX1 is slower than TK1. Jetson TX1	9	1154	October 18, 2021
TK1 vs Geforce 680 Jetson TK1	4	2292	May 3, 2014
Execution Time on Stitching for Jetson TK1 using OpenCV 3.2 - stitching_detailed.cpp Jetson TK1	13	1552	October 18, 2021
GPU Max Clock rate is slow, only 72 MHz (0.07 GHz) Jetson TX1	3	535	October 18, 2021

Performance comparision TK1 vs TX1

Related topics