my question goes kind of in the same direction. So far my app does the following:
cudaAlloc two linear buffers A and B on the device side
cudaMemcpy an image from host do device memory buffer A
execute a kernel which loads parts of A into shared memory, does some transformation and stores result values in B. After this buffer B contains an image with RGB 16bit elements.
Now my question is:
How do I efficiently display the image in B with openGL without doing any additional host-device data transfers?
I understand I have to copy contents of buffer B somehow from the CUDA context into openGL context, but I have no clue as to how exactly I can do that.
I had a close look at the SDK projects that use PBOs for displaying images already, but they confuse me even more.
imageDenoising, for instance:
//in the main function you can find these three lines of code:
CUDA_SAFE_CALL( cudaMemcpyToArray(a_Src, 0, 0,h_Src, imageW * imageH * sizeof(uchar4), cudaMemcpyHostToDevice)
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, imageW, imageH, 0, GL_RGBA, GL_UNSIGNED_BYTE, h_Src);
glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, imageW * imageH * 4, h_Src, GL_STREAM_COPY);
The same with boxFilter:
//the initCuda() function contains this line of code:
CUDA_SAFE_CALL( cudaMemcpyToArray( d_array, 0, 0, h_img, size, cudaMemcpyHostToDevice));
//additionally the initOpenGl() function copies the same h_img again:
glBufferDataARB(GL_PIXEL_UNPACK_BUFFER_ARB, width*height*sizeof(float), h_img, GL_STREAM_DRAW_ARB);
The obvious question is: Don’t they copy the image from host do device 3 or 2 times, respectively!?
I would appreciate any help.