BIG .cu & .cpp performance problem video capture application


I’m a French student who work on a video capture project. To do this, I use several cameras and make a treatment on their images.

So to begin, i make a .cu file that implement my treatment on readed images stored on CPU.I read them, pass them to the GPU (Quadro FX4600), make my treatment, and return them to host memory. The execution time is about 1.0s for 30 images (real-time)

then i integer this file in a CPP project, with the “extern C” method. In my program one thread grab images from camera then they are passed to the kernel. But It take about 26s for 1 image genertion !!! the CUDA - integrated program is much much longer than the CPU(Quad core Xeon) version (that runs at about 20Hz). So I ask for two questions :

  1. Can it be due to compilation options ?? in this case can you give me the proper options to have in a VS2005 project
  2. Can it be due to CPU thread management ???

EDIT : I have the .c & .i compilation line in output window but not the 2 .gpu lines can it be caused by this ??? and how to solve this problem ???