Comparing TK1 and TX1 GPU specs with OpenCV4Tegra mog2 algorithm

Hello,
I’m trying to compare the GPU specs of TK1 and TX1 through OpenCV4Tegra’s mog2 algorithm and would like to know if my experiment results are correct.

When running code like below in a video loop,
the results were like this.

□ENV:
1 minute avi sample video which is 1920x1080@60fps.
So the processed frame in total is 3600.
I calculated the average processing time of each frame.

□RESULT:
Processing Time using mog2() function
TK1: 17.7355ms, TX1: 12.6975ms

Upload + Download time using d_frame.upload() and d_frame.download()
TK1: 11.6649ms, TX1: 5.004511ms

The result of the GPU memory upload + download time is as expected
But I was thinking that the processing time would be far more
less in the case of TX1 since TX1’s GFLOPS is more than 2.5 times larger than that of TK1.

Can someone help me out on whether this result would be correct especially the processing time?

Thank you.

Mat fgmask;

                        gettimeofday(&t1, NULL);

                        d_frame.upload(cap);

                        gettimeofday(&t2, NULL);

            mog2(d_frame, d_fgmask, mog2_param.learningCoef);

                        gettimeofday(&t3, NULL);

                        d_fgmask.download(fgmask);

                        gettimeofday(&t4, NULL);

                        /* upload time */
                        elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0;
                        elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0;
                        cout << elapsedTime << ",";

                        /* processing time */
                        elapsedTime = (t3.tv_sec - t2.tv_sec) * 1000.0;
                        elapsedTime += (t3.tv_usec - t2.tv_usec) / 1000.0;
                        cout << elapsedTime << ",";

                        /* download time */
                        elapsedTime = (t4.tv_sec - t3.tv_sec) * 1000.0;
                        elapsedTime += (t4.tv_usec - t3.tv_usec) / 1000.0;
                        cout << elapsedTime << endl;

Hi usajpn,

When we benchmark TK1 vs TX1, we have seen some degradation, and some improvement, by deep SW architecture level synchronization.

To confirm if this is the problem; you can check the log for the execution from both and compare the time spent on the GPU. TX1 should be faster, if not, the slowdown is caused by GPU architecture changes, and probably the code is sub-optimal. If you see GPU execution in TX1 is faster, you can improve the pipeline by better use of streaming and synchronization.

Please see the discussion in other thread:
https://devtalk.nvidia.com/default/topic/978067/performance-comparision-tk1-vs-tx1/[url]
[/url]

Thanks

Hello kayccc

Thank you for your response.

As I said, TX1 is faster.
Since OpenCV4Tegra is your software,
is there any way you and your development team
can confirm if my result is reasonable?

Thank you.

Hi usajpn,

The prebuilt OpenCV4GTegra is a CPU & GPU optimized version of OpenCV toward Tegra architecture, suppose the result is reasonable, you could run with standard OpenCV to see if any downgrade as reference.

Besides, are you running TX1 with max performance, here is the link as reference:
[url]http://elinux.org/Jetson/TX1_Controlling_Performance[/url]

And other wiki for OpenCV performance:
[url]http://elinux.org/Jetson/Computer_Vision_Performance[/url]

Thanks