Memory management on the TX2 and opencv

chinustation · May 18, 2017, 5:34pm

I´m currently looking into getting a new embedded system for some image processing applications. I want to try out the TX2 but I would like to know some things I couldn’t find in the forums or data sheets.

Basically I want to know if going for the TX2 is better than my alternatives.

Im currently using OpenCV on a National Instruments CVS 1458-RT. I cross compile my algorithm as an .so on my windows machine and load it into labview as an external code. The CVS has an intel atom 64bit architecture.

So far running my algorithm using OpenCV functions reduces my processing time to about 1/3, as opposed to using labview functions. However, I will be working on a much processing intensive algorithm (using correlation) and have the need to run it at 24-30fps. And preferably using HD (720p).

I know that using CUDA has a big advantage when I use large images, and that for smaller ones there can be a bottle-neck with memory management. But I dont know anything about the opencv that is optimized for tegra that the jetson uses. Can anyone share more info on this? How would this compare to runnign opencv on the intel Atom?

linuxdev · May 18, 2017, 6:06pm

You might find that labview (or at least parts of it) have no compatibility with aarch64/arm64/ARMv8, so I’d first see what parts of labview might need to go directly on the Jetson. If it is a case of needing some output from the Jetson piped to another computer where the other computer is the only part needing labview then that in itself won’t be an issue.

So far as most video processing speed goes the TX2 is the king of the embedded world.

snarky · May 18, 2017, 6:37pm

If you can actually use CUDA, then it’s likely to be much faster than using the Intel Atom.
The Intel Atom is sluggish for Intel standards – running on plain ARM CPUs at the same frequency will often perform just as well or better than the equivalent Atom.
Running on GPUs will improve speed by a lot, as long as there is a lot of parallelism. (If you do large matrix multiplies or convolution or other such correlation functions, it should be very fast.)

One way of knowing whether this will work is to try this on a desktop machine with a NVIDIA graphics card.
Try running CPU-only on a desktop machine.
Then, try running with CUDA on the GPU on that same desktop machine.
Clearly, the desktop machine will have better performance than the Jetson (hundreds of Watts versus 7-13 Watts of power) but the relative difference between CPU and GPU should tell you how amenable your algorithm is to this optimization.

chinustation · May 18, 2017, 7:36pm

Im actually looking to get away from labview. I broke my head running opencv inside the embedded CVS. The cvs still limits as to what I can do with Opencv but still, using opencv is much much faster than the functions that labview offers. Also, the Jetson is way cheaper than the CVS. I understand that running an algorithm using only labview on NI hardware is much “safer” but I´m not in need of this.

If you can actually use CUDA, then it’s likely to be much faster than using the Intel Atom.
The Intel Atom is sluggish for Intel standards – running on plain ARM CPUs at the same frequency will often perform just as well or better than the equivalent Atom.
Running on GPUs will improve speed by a lot, as long as there is a lot of parallelism. (If you do large matrix multiplies or convolution or other such correlation functions, it should be very fast.)

One way of knowing whether this will work is to try this on a desktop machine with a NVIDIA graphics card.
Try running CPU-only on a desktop machine.
Then, try running with CUDA on the GPU on that same desktop machine.
Clearly, the desktop machine will have better performance than the Jetson (hundreds of Watts versus 7-13 Watts of power) but the relative difference between CPU and GPU should tell you how amenable your algorithm is to this optimization.

thank you. I think I will buy one and give it a try. I have used CUDA on my laptop. There is a difference when using large images, not so much when using small ones. But it has a 6th gen i7 and a GTX 960M with 4gigs of VRAM. It sure is beefier than the Jetson, but it´s also a power hog compared to an embedded system.

linuxdev · May 18, 2017, 7:44pm

Since CUDA does many things in parallel you will probably find the speed boost increases significantly if you use very large images or load several images at once to do operations on a larger data set simultaneously.

Topic		Replies	Views
Performance degradation on CUDA Jetson TX2	10	2251	October 18, 2021
Is a Tegra X2 usable for training basic CNNs? Jetson TX2	8	3336	April 26, 2017
Slow performance with opencv at jetson tx2 Jetson TX2	13	3895	October 18, 2021
Significant advantages of training models on TX2 (vs laptops/workstations) ? Jetson TX2	3	3318	January 15, 2020
Performance Question Jetson TX2	11	2444	May 29, 2017
Jetson TX2 inference time Jetson TX2	3	852	July 30, 2018
TensorFlow GPU runtime worse than CPU - TX2 Jetson TX2	14	4191	October 18, 2021
Opencv Face Detection Poor Performance with jetson nano Jetson Nano opencv	51	14207	October 14, 2021
does opencv_dnn use gpu? Jetson TX2	11	3098	October 18, 2021
Questions about efficient memory management for TensorRT on TX2 CUDA Programming and Performance	8	2008	October 12, 2021

Memory management on the TX2 and opencv

Related topics