I have written a method where I take an OpenGL texture ID as input, read the contents of the texture, store it in CUDA’s memory space and output a CUdeviceptr. I have studied the postProcessGL example from the sdk. However, I am using the driver API.
In short, I do as follows:
take an OpenGL texture ID as input
allocate memory for a pixel buffer object, pbo
bind the pbo (as GL_PIXEL_PACK_BUFFER)
use glGetTexImage to read data from the texture (which I have bound using the provided ID) into the pbo
bind a 0 buffer and a 0 texture
register the pbo with CUDA using cuGLRegisterBufferObject()
map the pbo to a CUdeviceptr (devPtrTemp) using cuGLMapBufferObject()
copy from devPtrTemp to another CUdeviceptr (devPtr) using cuMemcpyDtoD()
unmap the pbo using cuGLUnmapBufferObject()
unregister the pbo using cuGLUnregisterBufferObject()
set the output to devPtr
It seems to work fine. However, is this really the most efficient way to do it? For example, I would like to avoid copying from one CUdeviceptr to another if possible. Since I want to unmap and unregister the buffer it seems to me like I have to do this copy though, or am I wrong?
Also, does anyone know whether using glGetTexImage together with pixel buffer objects actually means copying data and whether this is done in gpu memory or not?
If you want to use the PBO data after unmapping and unregitering the PBO, then yes, you should do a memcopy since OpenGL allows PBOs to be moved in memory (even to cpu memory).
Do you do this operation once or for each frame? Note that if a PBO is being used as a texture source, you can register it once and then not unregister until you do the clean up.
There is an implicit memcopy when you call glTex[sub]Image with a PBO. This is due to OpenGL and is independent of CUDA. The vast majority of time this will be a copy of memory on the GPU, but, as I mentioned earlier, there is a possibility that a PBO has been moved to cpu memory (in which case you have a copy over PCIe). I believe the latter case can occur when you have PBOs and textures exceeding gpu memory resources.
Could you also explain or give me a direction for finding related documents/code/etc… about copy data on texture to cuda array? My purpose is to do some general purpose computations that implemented by Cg by my ex-colleague. The code snippet is like below:
glGenTextures(1, &g_texture_tmp1); // generate a texture ID
glEnable(GL_TEXTURE_2D); // enable rectangle texture
glBindTexture(GL_TEXTURE_2D, g_texture_tmp1); // bind the texture to a rectangle texture
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, window_width, window_height, 0, GL_LUMINANCE, GL_FLOAT, NULL);
cg_fprofile = cgGLGetLatestProfile(CG_GL_FRAGMENT); //get the latest fragment profile available
cgGLSetOptimalOptions(cg_fprofile); //for optimal compilation
fragment_flow = cgCreateProgramFromFile(cg_context,CG_SOURCE,FragmentProgramFlow,cg_fprofile,NULL,NULL); //compile and read Cg fragment program
cgGLLoadProgram(fragment_flow); //load the Cg fragment program
//get the handles of input variables in the Cg fragment program
cg_input2 = cgGetNamedParameter(fragment_flow, "input2");
cg_input3 = cgGetNamedParameter(fragment_flow, "input3");
// Enable the Cg program
//enable the input texture for the Cg program
//Draw a quadrangle
// copy the result from the first-pass rendering to the tmp texture
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, window_width,window_height);
//disable the input texture for the Cg program
//disable the Cg program
In Cg, it map texture ID directly to its cg_input2, but in CUDA, it seems that I need to map to cuda array… I would like to know how to map it.
What do you want to do with the texture in your CUDA code:
are you going to fetch from it, like you would in a Cg program? If so, do you need texture functionality, or would coalesced accesses to memory with the data suffice?
Is the texture in question used by OpenGL at all? Meaning, is it used only by the CUDA kernel, or by both CUDA and graphics rendering.
As far as mapping a graphics texture to a CUDA array is concerned, register the PBO, map it to get a device pointer, and then do a memcopy from that pointer to a CUDA array. Then bind the array to a CUDA texture and fire off your kernel. However, depending on your answers to the questions above, all these steps may not be necessary.
As my comprehension, the cg_input4 is used as the texture reference defined in CUDA programming guide. Note that at this time the data stored in this texture are not displayed, I mean, it didn’t get any relationship with PBO we defined eariler. I don’t know how to map the data in this texture to a CUDA array… In the programming guide I didn’t find the respective functions for this purpose. Even as your suggestion: Map PBO to a device pointer, I don’t know what function I should call. Maybe I misunderstood something, can you explain further for me? If you need, I can mail the .cpp file (combined with Cg) to you. Thanks for your reply in advance.
(upload://zud1TuLNptls3dnfHR0nhlJMz6c.cpp) (44.3 KB)
My point is that if the texture you are referring to is never used for rendering, just computing, don’t even create an OpenGL texture. Just load the data directly with cuda.
For rendering output from CUDA, create a PBO, register it with CUDA, map it to get the pointer, then have the CUDA write it’s output in the memory addressed by that pointer. Then unmap the PBO and display it with either a glDrawPixels or via an OpenGL texture.