Global variable in CUDA context

Hello Everybody,

I’ve been trying to optimize an old CUDA program I made a few months ago. The program has the following structure:

///////////////////////////////////////////////////////////// ////////////////////////
CUDA set-up : All the necesary stuff to configure CUDA in this application.
OPENGL set-up : All the necesary stuff to configure OPENGL in this application.


OpenGL functions: When the user press a key, the treatment on the image changes.
///////////////////////////////////////////////////////////// ////////////////////////

Obviously, an OPENGL-based program works cyclically. This means, the same procedure is made until the user interacts. The procedure consists in three things:

  1. Map the data as an OpenGL resource.
  2. Call a Device function.
    2.1. Bind the data (in this moment is a texture) to an array.
    2.2. Execute some kernels on this array (These kernels depend on the key pressed).
    2.3. Unbind the data.
  3. Unmap the resource with the aim that OPENGL can render this data.

The data is an 8-bit gray-scale image. So far, the program works as well as I need. But I noted that in the kernel part there’s wasted time in the dynamic allocation of the data which will be processed. This Dynamic allocation is always the same, that is, everytime the cycle begins the program creates and allocates the same data arrays. How can I do this dinamic allocation only once, say, in the beginning??

If anybody knows how to do this, or knows an example, please let me know.


Try allocating a global device variable from within the host code (cudaMalloc or cuMemAlloc), before starting the kernel, every time the size You need to allocate changes (if so). Pass the device pointer, returned by the routine, as the kernel parameter.


Hello c.master.matso,

You’ve been very helpful. I’m gonna implement what you just say.

Thanks a lot!