adaboost face detection using cuda

I am changing the adaboost face detection module to a cuda version to improve the performance. I designed the integral image to lay in the device(GPU) memory, and to let cuda kernel deals with every image path to pass the adaboost classifier . then, my problem is that producing the device integral image is very slow due to large image size(5MP). so, this process is slower than even searching face candidate by adaboost, and i wonder how can improve the performance of face detection using cuda even though there are many developers who developed cuda adaboost face detection. how they could do it? I am frustrated. if you have been developed it, please guide me. thanks in advance.

The npp library provides functions for generation of integral images. I’m not saying I know how to do adaboost face detection or anything like that.

Thanks anyway, txbob. Its also helpful information for me