CUDA OpenCV questions

Hi everyone. I’m another person new to CUDA (and to the forum) looking for some information from more experienced users and this is my first post, so go easy ;)

I’m trying to convert a lengthy C++ video processing algorithm to CUDA. The current algorithm is an MFC application that uses OpenCV functions quite a bit. As I’ve been go through the algorithm, the parts that seem best for parallelization are for loops that initialize buffers or extract RGB values for calculations.

Firstly, for clarification: I’ll be using a GTX480 card - does anyone know if GPUCV supports this card and will be of any use to me here? Their website isn’t clear to me although I realize that this forum isn’t the best place to ask this question.

Secondly, if I have a for loop that goes through every pixel of each frame and uses OpenCV functions such as cvQueryFrame() and cvGet2D() to extract RGB values how should I go about converting such a loop to a kernel? I can’t use these host functions within the global kernel. I’m asking this question because I feel like I’m not the only person who has run into this situation and there’s likely an answer out there that I have not been able to find.

Thanks!

Hi everyone. I’m another person new to CUDA (and to the forum) looking for some information from more experienced users and this is my first post, so go easy ;)

I’m trying to convert a lengthy C++ video processing algorithm to CUDA. The current algorithm is an MFC application that uses OpenCV functions quite a bit. As I’ve been go through the algorithm, the parts that seem best for parallelization are for loops that initialize buffers or extract RGB values for calculations.

Firstly, for clarification: I’ll be using a GTX480 card - does anyone know if GPUCV supports this card and will be of any use to me here? Their website isn’t clear to me although I realize that this forum isn’t the best place to ask this question.

Secondly, if I have a for loop that goes through every pixel of each frame and uses OpenCV functions such as cvQueryFrame() and cvGet2D() to extract RGB values how should I go about converting such a loop to a kernel? I can’t use these host functions within the global kernel. I’m asking this question because I feel like I’m not the only person who has run into this situation and there’s likely an answer out there that I have not been able to find.

Thanks!

You may visit http://cuvilib.com/ . They have implemented some of the OpenCV stuff on GPU.

You may visit http://cuvilib.com/ . They have implemented some of the OpenCV stuff on GPU.

Thank you very much Crankie. I’ll definitely try to integrate this with my project soon and hopefully I won’t have to ask more questions about it (a lot more, at least :) ).

Thank you very much Crankie. I’ll definitely try to integrate this with my project soon and hopefully I won’t have to ask more questions about it (a lot more, at least :) ).

What exact algorithm/filtering are you looking for? If you have any CUVI related questions then do leave them on the CUVI Forums for quick replies.

Well right now I’m working on something a bit more basic (but I’m new to CUDA so I wouldn’t consider it easy) and in a few months I’ll be working with algorithms that are more focused on actual video processing.

Basically I’m trying to take a loop like this:

for(int num = 0; num < numFrame; ++num)

			{

				int count = 0;

				while(1)

				{

					frame = cvQueryFrame(capture);

					if(count == 0)

					{

						for(int i = 0; i < frame->width; ++i)

						{

							for(int j = 0; j < frame->height; ++j)

							{

								temp = cvGet2D(frame, j, i); //Gets RGB values for pixel(j,i)

								model[i][j][0] += temp.val[0]/numFrame;

								model[i][j][1] += temp.val[1]/numFrame;

								model[i][j][2] += temp.val[2]/numFrame;

							}

						}

					}

					count++;

					if(count == 1)

						break;

					cvWaitKey(37);

				}

			}

temp is declared as: CvScalar temp = {0};

model is declared as:

double ***model;

			model = new double**[frame->width];

			for(int i = 0; i < frame->width; ++i)

			{

				model[i] = new double*[frame->height];

				for(int j = 0; j < frame->height; ++j)

				{

					model[i][j] = new double[3];

				}

			}

…and convert it for use on a GPU like this:

for(int num = 0; num < numFrame; ++num)

			{

				int count = 0;

				while(1)

				{

					frame = cvQueryFrame(capture);

					if(count == 0)

					{

						GPU_Wrapper();

					}

					count++;

					if(count == 1)

						break;

					cvWaitKey(37);

				}

			}

So basically, the kernel would ideally call cvGet2D and do the matrix addition but I can’t call cvGet2D from within a kernel. I currently have 480 threads and 640 blocks for the use of this program on a 640x480 video such that each pixel’s RGB values can be extracted from cvGet2D and added to the model in parallel. I’m not sure if CUVI will be of particular help here, though.