opencv facedetection speedup in cvIntegral

hi…i have been searching and investigating about the possibility of speeding up opencv facedetection.

The main face detection part is done by the cvHaarDetectObjects function.going through that function most of it involved data structures and then at one point there was the use of cvIntegral
It seemed to be one important process that is done in the facedetection.
(is it the haar transform??..
)
Also the cvintegral function din seem clear to me,i tried googling about its working…during this i came across one good developer who has implemented the cvIntegral in a simple manner

LINK : cvintegral

==http://opencv.blogspot.com/2005/04/cvintegral-on-32-bit-floating-point_16.html==

making this code cuda enabled would help speed up the facedetection i feel…

but frankly i am kinda finding it tough to get into cuda programming…

Can you help me start with the process…we can also work together… :hug: :-)

:magic:

hi i timed the cvintegral and cvcanny function together for the facedetect demo in opencv that uses the lena image…both the functions tooks only about 5 to 6 ms of the total avg 200ms detection time…

then i did a count for the cvrunhaardetectobjects for the same example and it is called almost 184254 times for the lena image…

as it iterates over the entire image with the cascades and makes the pattern check for a face it gets so many calls…

this could be made to run in cuda isnt it…multiple processors can do the cascade comparison at different parts of the image simultaneously to have a very good speed up isnt it…

anyone has your suggestions…Is it possible converting the cvRunHaarClassifierCascade to a parallel process…

i came across this cool site that very beautifully shows the working of the OpenCv facedetect demo…

http://morph.cs.st-andrews.ac.uk/fof/haarDemo/index.html

hi…i did some experimenting with gprof for the facedetct of opencv and fond the most time consuming function…here is the list

% cumulative self self total
time seconds seconds calls ms/call ms/call name

78.38 0.29 0.29 3101044 0.00 0.00 icvEvalHidHaarClassifier(CvHidHaarClassifier*, double, unsigned int)

10.81 0.33 0.04 184588 0.00 0.00 cvRunHaarClassifierCascade

2.70 0.34 0.01 40689 0.00 0.00 icvXMLParseTag(CvFileStorage*, char*, CvStringHashNode**, CvAttrList**, int*)

its a huge list actually…i am attaching the profile here

can cuda be used to parallelize the algorithm?
gprofile.txt (46.8 KB)

Hi,

is anybody is still in need of this? I am actually doing a project of porting the viola/jones on cuda. I hope I get it right.

Hi dbancajas, I’m really interested on having it ported to cuda… are you still around?

Thanks in advance!

Hi,
Is anyone manage to parallelize opencv face detection?

I trying to write opencl version of opencv face detection. Any suggestion will be very very helpful!

Thank you very much :)

Integral images are also known as summed-area tables (SATs), and are being used here to quickly sum all the values in a particular rectangle. You can compute SATs efficiently in CUDA using parallel prefix sum, there’s a demo of this in the CUDPP library:

http://gpgpu.org/static/developer/cudpp/rel/cudpp_1.1/html/index.html
http://en.wikipedia.org/wiki/Summed_area_table

I agree it would be cool to have a fast face detection demo in CUDA!

There is also a summed area table example in the Thrust examples.
http://code.google.com/p/thrust/

Download the examples .zip and check out:
summed_area_table.cu