Using OpenCV GPU HOG with Jetson TK1

Jojjaa · June 29, 2015, 2:57pm

Hi everyone,

I´m doing some image processing tasks with my Jetson. For this I use OpenCV GPU HOGDescriptor with the GPU DetectMultiscale function. The Board is runting L4T 21.3 with OpenCV4tegra 2.4.10.1. All 4 Cores are active and running constantly with max. freq as well as the GPU.

Erverything looks fine, the speed-boost against the CPU HOG on the Jetson, as well as the detection results, only the processing time makes some problems.

In detail, for my application the 192 CUDA Cores needs 100ms for the GPU DetectMultiscale function. But this duration is not constant, as we usally can expect from the HOG algorithm. The duration is jumping around from 100ms up to 160ms, so up to 60% more.

If I run my application with an Intel i5 and GTX 960, the GPU processing time is almost constant, max. +5%.

Does this happen on the Jetson because it has only 1 SM Block, and so there might be much more overhead compared to the 8 SM´s of the GTX 960?

I´ve also disabled the L4T GUI and execute the application, there is no change.

Can someone test it with his own Jetson?

Here are the important code snips and facts:

For the tests I used always the same picture
The source picture is 1024x414
I use the build-in getDaimlerPeopleDetector SVN
p_HogWinHeight = 96
p_HogScaleLevels = 15
p_HogThreshold = 1.5
p_HogMultiScalefac = 1.10

//** Initialize OpenCV GPU-HOG **
hog = gpu::HOGDescriptor(Size(p_HogWinHeight / 2, p_HogWinHeight), Size(16, 16), Size(8, 8), Size(8, 8), 9, -1.0, 0.2, 1, p_HogScaleLevels);

hog.gpu::HOGDescriptor::setSVMDetector(cv::HOGDescriptor::getDaimlerPeopleDetector());

...

gpuImg.upload(img_crop);

//** OpenCV Time measurement
TickMeter tm; tm.start(); //tic

hog.gpu::HOGDescriptor::detectMultiScale(gpuImgSmall, hogUnfiltered, p_HogThreshold, Size(8, 8), Size(0, 0), p_HogMultiScalefac, 2);

tm.stop(); cout << " Detector(ms): " << tm.getTimeMilli() << endl; //toc

ShervinE · July 1, 2015, 11:12pm

Hi,

The GPU in Tegra K1 is almost identical to a desktop Kepler GPU, but the CPU and memory subsystem is completely different, so it should be expected that you’ll see different performance behavior in desktop vs mobile.

I’m not sure what is effecting your performance issue specifically, but note that Tegra K1 has many levels of speed & voltage & temperature throttling, so it is common that measured speeds on Tegra vary over time, even if you do the exact same operation continuously. There are various ways to make the timing more consistent on Tegra K1 (see [url]http://elinux.org/Jetson/Performance[/url]).

Cheers,
Shervin.

Jojjaa · July 2, 2015, 8:12am

Hi,

thanks for your answer. One of the first things I created was a little script to unleash the full power of K1, with the help of the Wiki in your post. I also have a little GUI monitoring clocks and temperatures. So clocks are always at max.

With the Profiler application I monitored GPU activity and running tasks over time. The used CUDA functions from the hog.cu in OpenCV (for example: void compute_hists_kernel_many_blocks or normalize_hists_kernel_many_blocks …) have always the same calculation time. The thing which is different between the frames is the cudaDeviceSynchronize task. This tasks duration is dramatically fluctuating (from 500us up to over 4 ms). Using 15 Multiscale Stages for the HOG this seems to be the problem.

The OpenCV CUDA HOG is written very general. I think 1 SM with (192 Cores) is not the preffered plattform for it. I´m not sure if its worth the effort to optimize the source code especially for K1. Tegra X1 with its 2SM and (128 Cores per SM) can handle this better. Hope that there will be a Jetson Board with Tegra X1 soon!

Topic		Replies	Views
Opencv4Tegra GPU vs CPU TK1 vs TX1 Jetson TX1 opencv	3	3671	April 28, 2016
Comparing TK1 and TX1 GPU specs with OpenCV4Tegra mog2 algorithm Jetson TX1	4	1027	October 18, 2021
How to monitor GPU utilization on Jetson-TK1 Jetson TK1	6	11540	October 31, 2016
Anyone know how to monitor the GPU MHz? Jetson TK1	15	31807	October 27, 2014
does opencv_dnn use gpu? Jetson TX2	11	3098	October 18, 2021
How to use full CPU & GPU Potential of Jetson AGX Xavier Jetson AGX Xavier	5	1793	October 18, 2021
CPU timing on Jetson TK1 Jetson TK1	7	1146	October 18, 2021
TK1 very slow GPU initialization Jetson TK1	12	1409	October 18, 2021
Overheating issue? Jetson TX1	19	3027	May 28, 2016
GPU based endoscope feasibility Jetson TX1	1	692	January 23, 2017

Using OpenCV GPU HOG with Jetson TK1

Related topics