Memory management on the TX2 and opencv

I´m currently looking into getting a new embedded system for some image processing applications. I want to try out the TX2 but I would like to know some things I couldn’t find in the forums or data sheets.

Basically I want to know if going for the TX2 is better than my alternatives.

Im currently using OpenCV on a National Instruments CVS 1458-RT. I cross compile my algorithm as an .so on my windows machine and load it into labview as an external code. The CVS has an intel atom 64bit architecture.

So far running my algorithm using OpenCV functions reduces my processing time to about 1/3, as opposed to using labview functions. However, I will be working on a much processing intensive algorithm (using correlation) and have the need to run it at 24-30fps. And preferably using HD (720p).

I know that using CUDA has a big advantage when I use large images, and that for smaller ones there can be a bottle-neck with memory management. But I dont know anything about the opencv that is optimized for tegra that the jetson uses. Can anyone share more info on this? How would this compare to runnign opencv on the intel Atom?

You might find that labview (or at least parts of it) have no compatibility with aarch64/arm64/ARMv8, so I’d first see what parts of labview might need to go directly on the Jetson. If it is a case of needing some output from the Jetson piped to another computer where the other computer is the only part needing labview then that in itself won’t be an issue.

So far as most video processing speed goes the TX2 is the king of the embedded world.

If you can actually use CUDA, then it’s likely to be much faster than using the Intel Atom.
The Intel Atom is sluggish for Intel standards – running on plain ARM CPUs at the same frequency will often perform just as well or better than the equivalent Atom.
Running on GPUs will improve speed by a lot, as long as there is a lot of parallelism. (If you do large matrix multiplies or convolution or other such correlation functions, it should be very fast.)

One way of knowing whether this will work is to try this on a desktop machine with a NVIDIA graphics card.
Try running CPU-only on a desktop machine.
Then, try running with CUDA on the GPU on that same desktop machine.
Clearly, the desktop machine will have better performance than the Jetson (hundreds of Watts versus 7-13 Watts of power) but the relative difference between CPU and GPU should tell you how amenable your algorithm is to this optimization.

Im actually looking to get away from labview. I broke my head running opencv inside the embedded CVS. The cvs still limits as to what I can do with Opencv but still, using opencv is much much faster than the functions that labview offers. Also, the Jetson is way cheaper than the CVS. I understand that running an algorithm using only labview on NI hardware is much “safer” but I´m not in need of this.

thank you. I think I will buy one and give it a try. I have used CUDA on my laptop. There is a difference when using large images, not so much when using small ones. But it has a 6th gen i7 and a GTX 960M with 4gigs of VRAM. It sure is beefier than the Jetson, but it´s also a power hog compared to an embedded system.

Since CUDA does many things in parallel you will probably find the speed boost increases significantly if you use very large images or load several images at once to do operations on a larger data set simultaneously.