CUDA-OPENCV : low performances instead of high performances

Hi everyone,

We are going through performances problem by using (wrongly I guess) the GPU of our computer.
We try to compute 8 video cameras (720p) through 16 threads (8 threads of acquisition and 8 thread of image analysis) and our code clearly can’t be treated in real time cause it take 1.5 second to make the treatment of what the 8 cameras record in 1 second.
As an information, here is our hardware configuration :

  • Intel Core I5 2500K

  • GPU NVIDIA GTX 660TI
    As explained everywhere on the web,we were expecting to improve the fasten of treatment for our code by treating it through the GPU but in fact we are getting slower.
    We compile correctly the code after installing the whole cuda environment successfully but performances are clearly missing.
    Our code is decomposed in this way :

  • Here is the treatment thread using GPU (color filter)

GpuMat filtrage_gpu_couleur(GpuMat source,filtre_couleur seuil_couleur)
{
	GpuMat input[3];
	GpuMat interm[3],interm2[3],output[3];
	GpuMat out,temp,temp2;
	cuda::split(source,input);
	cuda::threshold(input[0],interm[0],seuil_couleur.couleur_min[0],255,THRESH_BINARY);
	cuda::threshold(input[1],interm[1],seuil_couleur.couleur_min[1],255,THRESH_BINARY);
	cuda::threshold(input[2],interm[2],seuil_couleur.couleur_min[2],255,THRESH_BINARY);

	cuda::threshold(input[0],interm2[0],seuil_couleur.couleur_max[0],255,THRESH_BINARY_INV);
	cuda::threshold(input[1],interm2[1],seuil_couleur.couleur_max[1],255,THRESH_BINARY_INV);
	cuda::threshold(input[2],interm2[2],seuil_couleur.couleur_max[2],255,THRESH_BINARY_INV);
	
	interm[0].copyTo(temp);
	temp.copyTo(temp2,interm[1]);
	temp2.copyTo(temp,interm[2]);
	temp.copyTo(temp2,interm2[0]);
	temp2.copyTo(temp,interm2[1]);
	temp.copyTo(out,interm2[2]);

	return out;
}

Do you see any coding mistake that doesn’t permit us to access to GPU performance?
is there something to do to transfer data to the GPU before the function?

Thanks in advance for your help!

1 Like