Execution Time on Stitching for Jetson TK1 using OpenCV 3.2 - stitching_detailed.cpp

Hi!

I want to use the Jetson TK1 to make a Stitch of 4 cameras in real time. When a execute the example stitching_detailed.cpp using gpu, every loop that it make is higher than one second. That loop ranges from line 763 to 895.

Is it the maximum frequency that TK1 has work? Can I increment it?

I have installed the OpenCV 3.2 with the extra libraries and CUDA 6.5.

Hi Ivan_Rios,

What is your resolution of camera output? How about the result of stitching image?

What frame rate about input/output?

Could you point out your criteria?

Hi Wayne,

The resolution of camera input is 768x576 and the image output looks great. If I change some flags, I can reduce the loop time to 0.5 seconds decreasing the quality. The quality is acceptable but the loop time is still too much. (–try_cuda yes --blend_strength 0 --blend feather --warp cylindrical)

Now I using four .png to do the example to simplify it. Then, my “frame rate” is the a stitched image in a loop time.

In the end I want a stitched video with more than 10 frames per second, identify the keypoints and keep track of them. Can I get it with the TK1?

I don’t know how exploit fully the potential of the its GPU in this case. I program and compile in TK1 itself using graphic environment for Ubuntu. I use g++4.8 to compile the file .cpp

Thank you so much!

How is your gpu usage from tegrastats?

sudo ./tegrastats

When it is in the loop:

RAM 1297/1892MB (lfb 4x4MB) cpu [64%,61%,off,off]@-1 VDE 0 EDP limit 0

Please use “sudo” to enable the usage of GR3D.

I launch:
“sudo ./stitching_detailed (and all of parameters)”

and I get:
“modprobe: FATAL: Module nvidia not found.”

The loop time and performance are equal.

Sorry for misleading. What I meant is
sudo ./tegrastats

Ah, ok.

I get this with sudo ./tegrastats.

RAM 1357/1892MB (lfb 1x2MB) cpu [75%,off,off,off]@1887 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 0%@180 EDP limit 0
RAM 1362/1892MB (lfb 1x2MB) cpu [58%,74%,off,off]@2065 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 0%@108 EDP limit 0
RAM 1353/1892MB (lfb 1x2MB) cpu [50%,off,off,off]@1092 EMC 21%@600 AVP 0%@204 VDE 120 GR3D 1%@180 EDP limit 0
RAM 1357/1892MB (lfb 1x2MB) cpu [40%,75%,off,off]@1326 EMC 14%@924 AVP 0%@204 VDE 120 GR3D 0%@252 EDP limit 0
RAM 1360/1892MB (lfb 1x2MB) cpu [52%,60%,off,off]@2065 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 69%@108 EDP limit 0
RAM 1359/1892MB (lfb 1x2MB) cpu [46%,61%,off,off]@1326 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 0%@108 EDP limit 0
RAM 1358/1892MB (lfb 1x2MB) cpu [58%,53%,off,off]@2065 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 0%@180 EDP limit 0
RAM 1358/1892MB (lfb 1x2MB) cpu [62%,off,off,off]@1938 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 95%@72 EDP limit 0
RAM 1361/1892MB (lfb 1x2MB) cpu [77%,off,off,off]@1734 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 64%@108 EDP limit 0
RAM 1355/1892MB (lfb 1x2MB) cpu [52%,off,off,off]@2065 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 67%@396 EDP limit 0
RAM 1361/1892MB (lfb 1x2MB) cpu [67%,off,off,off]@2065 EMC 12%@924 AVP 0%@204 VDE 120 GR3D 78%@108 EDP limit 0
RAM 1358/1892MB (lfb 1x2MB) cpu [62%,65%,off,off]@1530 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 0%@72 EDP limit 0
RAM 1358/1892MB (lfb 1x2MB) cpu [51%,62%,off,off]@2065 EMC 13%@924 AVP 0%@204 VDE 120 GR3D 20%@252 EDP limit 0

Seems that your GPU clk is not pulled to maximum when you running the app.

Please try following method to raise it and run your app again.

http://elinux.org/Jetson/Performance#Controlling_GPU_performance

I’ve changed the frequency of GPU and CPU to maximum (GPU:852MHz CPU:http://elinux.org/Jetson/Performance#Maximizing_CPU_performance)

Now the loop time is 0.3 second and the sudo ./tegrastats information is this:

RAM 1458/1892MB (lfb 1x4MB) cpu [19%,46%,25%,28%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1456/1892MB (lfb 1x4MB) cpu [25%,14%,53%,25%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 1%@852 EDP limit 0
RAM 1454/1892MB (lfb 1x4MB) cpu [31%,44%,15%,25%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1449/1892MB (lfb 1x4MB) cpu [29%,37%,14%,40%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1454/1892MB (lfb 1x4MB) cpu [49%,34%,32%,16%]@2065 EMC 18%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1455/1892MB (lfb 1x4MB) cpu [53%,26%,17%,23%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1459/1892MB (lfb 1x4MB) cpu [23%,53%,18%,25%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1455/1892MB (lfb 1x4MB) cpu [31%,52%,11%,25%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1454/1892MB (lfb 1x4MB) cpu [35%,57%,13%,10%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 26%@852 EDP limit 0
RAM 1459/1892MB (lfb 1x4MB) cpu [33%,46%,25%,10%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1450/1892MB (lfb 1x4MB) cpu [34%,53%,13%,10%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1455/1892MB (lfb 1x4MB) cpu [38%,47%,21%,12%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 9%@852 EDP limit 0
RAM 1459/1892MB (lfb 1x4MB) cpu [34%,41%,12%,31%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 9%@852 EDP limit 0
RAM 1452/1892MB (lfb 1x4MB) cpu [29%,22%,24%,43%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 48%@852 EDP limit 0
RAM 1455/1892MB (lfb 1x4MB) cpu [43%,39%,16%,18%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 26%@852 EDP limit 0
RAM 1460/1892MB (lfb 1x4MB) cpu [53%,38%,14%,11%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 20%@852 EDP limit 0
RAM 1456/1892MB (lfb 1x4MB) cpu [33%,15%,21%,48%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 23%@852 EDP limit 0
RAM 1457/1892MB (lfb 1x4MB) cpu [48%,20%,20%,27%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 18%@852 EDP limit 0
RAM 1451/1892MB (lfb 1x4MB) cpu [28%,35%,13%,37%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 0%@852 EDP limit 0
RAM 1456/1892MB (lfb 1x4MB) cpu [56%,14%,17%,36%]@2065 EMC 16%@924 AVP 0%@204 VDE 120 GR3D 25%@852 EDP limit 0
RAM 1459/1892MB (lfb 1x4MB) cpu [60%,29%,13%,25%]@2065 EMC 17%@924 AVP 0%@204 VDE 120 GR3D 17%@852 EDP limit 0
RAM 1460/1892MB (lfb 1x4MB) cpu [38%,37%,43%,18%]@2065 EMC 18%@924 AVP 0%@204 VDE 120 GR3D 25%@852 EDP limit 0
RAM 1452/1892MB (lfb 1x4MB) cpu [39%,18%,41%,36%]@2065 EMC 18%@924 AVP 0%@204 VDE 120 GR3D 41%@852 EDP limit 0

Is this the maximum optimization that I can get with looped stitching_detailed.cpp?
Is there any way to make faster CPU functions?

Thank you so much!!

I see your cpu/gpu usage is not 100%. You should try to modify your code for more efficient now.

Ok, thank you so much!