Hello,
i am trying to map an OpenGL texture into cuda to display my kernels result.
I am calculating a simple grayscale 2d image, type is unsigned char.
I set my texture’s format like this:
glTexImage2D(GL_TEXTURE_2D,0,GL_LUMINANCE,C_WIDTH,C_HEIGHT,0,GL_LUMINANCE,GL_UNSIGNED_BYTE,NULL);
than i got an error from cudaGraphicsGLRegisterImage. “invalid argument”
The CUDA manual says:
“supports all texture formats with 1, 2, or 4 components”
Can anyone tell me why i get this error, and how can i register a grayscale texture to use with cuda?
In my code I do this (color output buffer setup, I use floating-point textures):
glGenTextures(1, &s_glColor);
glBindTexture(GL_TEXTURE_2D, s_glColor);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, w, h, 0, GL_RGBA, GL_FLOAT, 0);
// replace 'GL_RGBA32F_ARB' with 'GL_RGBA' and 'GL_FLOAT' with 'GL_UNSIGNED_BYTE'
glTexParameteri(GL_TEXTURE_2D , GL_TEXTURE_MIN_FILTER , GL_NEAREST);
glBindTexture(GL_TEXTURE_2D, 0);
glGenBuffers(1, &s_glColorBuffer);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, s_glColorBuffer);
glBufferData(GL_PIXEL_UNPACK_BUFFER, w * h * 4 * sizeof(float), 0, GL_DYNAMIC_DRAW);
// replace 'sizeof(float)' with 'sizeof(unsigned char)'
cuGraphicsGLRegisterBuffer(&s_cuColorBuffer, s_glColorBuffer, CU_GRAPHICS_MAP_RESOURCE_FLAGS_WRITE_DISCARD);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
To display the texture I do:
cuGraphicsMapResources(1, &s_cuColorBuffer, 0);
cuGraphicsResourceGetMappedPointer(&s_ColorBuffer, &bytes, s_cuColorBuffer); // 's_ColorBuffer' is a device pointer You pass to kernel
glActiveTexture(GL_TEXTURE0);
if (!glIsEnabled(GL_TEXTURE_2D)) glEnable(GL_TEXTURE_2D);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, s_glColorBuffer);
glBindTexture(GL_TEXTURE_2D, s_glColor);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, s_glW, s_glH, GL_RGBA, GL_FLOAT, 0);
And draw a screen-aligned quad with the bound texture.
You can put ‘GL_LUMINANCE32F_ARB’ instead of ‘GL_RGBA32F_ARB’ for single channel floating-point texture. I use simmilar code to produce depth buffer. You can also use the
4-channel texture and write the channels (R,G,B) with the same value, but I think I don’t need to say this External Image.
meanwhile i figured out that i can do what i want only with a PBO
OpenGL texture can be only mapped as a cudaarray, and it is read-only for kernels (only with texture fetching).
With the PBO now it works fine. (i was looking the boxfilter example from the CUDA SDK).
For simplicity i did not modified my kernels, i call a cudamemcpy (cudaMemcpyDeviceToDevice)
with my kernel result as source, and the destination is the mapped pointer of the PBO.
This takes ~50 usec only. (based on the bandwith printed by bandwidthtest sdk example)