I have compared the performance of TK1 vs TX1. I made some image processing using different sizes of images (1920 x 184 pixel, 1920 x 300 pixel, 1920 x 1200 pixel). The results:
TX1
36 ms for 1200 px
12 ms for 300 px
8 ms for 184 px
TK1
41 ms for 1200 px
13 ms for 300 px
9 ms for 184 px
Why is there almost no difference for small images? The warp size is the same. (but I am not using CUDA kernels, I am using OpenCV4Tegra)
Hello,
It’s quite a complicated problem to compare performance.
First, I’m not sure what kind of algorithm you are running. Is GPU acceleration applied?
Generally, for GPU acceleration, there are some extra overheads. So for smaller picture, the processing acceleration may be less obvious than bigger pictures.
For you case, you can also check the system status by ‘tegrastats’. Probably, the system does not run in max state.
When we benchmark TK1 vs TX1, we have seen some degradation, and some improvement, by deep SW architecture level synchronization.
To confirm if this is the problem; you can check the log for the execution from both and compare the time spent on the GPU. TX1 should be faster, if not, the slowdown is caused by GPU architecture changes, and probably the code is sub-optimal. If you see GPU execution in TX1 is faster, you can improve the pipeline by better use of streaming and synchronization.