Hello all,
I have been doing some image processing computations with CUDA using a webcam as an image source and I am using NVIDIA’s image denoising code as a base. Image denoising example uses unsigned integer to represent the color of a pixel (ARGB).
As a difference from the original image denoising code, I am using PBOs (not textures) to pass data to the device (via the glBindBuffer - glBufferData - glMapBufferARB - memcpy - glUnmapBufferARB cycle). The video capture library I am using delivers the pixel information as BGR and because of this I copy the video buffer to PBO buffer via this for loop instead of a plain memcpy:
GLubyte* ptr = (GLubyte*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB);
// change from BGR to ARGB in order to satisfy memory coalesce requirements
for(int r = 0; r < imageH; r++)
{
for(int c = 0; c < imageW; c++)
{
ptr[r*imageW*4+4*c] = frame[r*imageW*3+3*c+2];
ptr[r*imageW*4+4*c+1] = frame[r*imageW*3+3*c+1];
ptr[r*imageW*4+4*c+2] = frame[r*imageW*3+3*c];
ptr[r*imageW*4+4*c+3] = 0;
}
}
As you can guess, this decreases performance. I was wondering whether I can achieve such a copy in a smoother way, maybe by using texture references? Any suggestions appreciated…