Trouble getting CUDA and OpenGL to work together

Thanks in advance for looking at my problem. I am very new to both OpenGL and CUDA, so please be patient with me!

I am attempting to use CUDA to perform some image processing tasks and display the results using OpenGL. My program worked when I had CUDA write the output back to the host, load into a texture (glTexImage2D) and display by binding the texture to polygon. I am attempting to “cut out the middle man” and load the output into the texture object on the device. Is this appropriate?

Essentially, can I (listing the steps and the commands I think might be pertinent to explain what I am doing):

  1. Create the texture as before
    GLuint texobj;
    glGenTextures(1,&texobj);
    glBindTexture(GL_TEXTURE_2D, texobj);
    glTexImage2D(GL_TEXTURE_2D,0,GL_RGBA,width,height,0,GL_RGBA,
    GL_FLOAT,NULL);
    glTexParameteri(…);
    glBindTexture(GL_TEXTURE_2D,0);

  2. Register with CUDA
    cudaGraphicsResource texobj_CUDA;
    cudaGraphicsGLRegisterImage(&texobj_CUDA, texobj,GL_TEXTURE_2D,cudaGraphicsMapFlagsNone);

  3. Get a pointer to the cudaArray (on the device) that represents the texture object on the device (at least I think that this is what I am doing here…)
    cudaArray *cuArray;
    cudaGraphicsMapResources(1,&texobj_CUDA,0);
    cudaGraphicsSubResourceGetMappedArray(&cuArray,texobj_CUDA,0,0);

  4. Write data to the cudaArray (cuArray) on the device
    This is the step where the code tells me I’m doing something wrong.
    Assume that I have a float4 array on the device (f4array_dev) with the RGBA values I want to display. How do I copy this array into the texture object? I tried:
    cudaMemcpyToArray(cuArray,0,0,f4array_dev,widthheightsizeo
    f(float4),cudaMemcpyDevicetoDevice);
    This compiles but gives a runtime error message that I have an “invalid argument.” I tried creating my float4 array cudaMallocPitch and using cudaMemcpy2DToArray, but that gives a similar error message. I am beginning to think I am missing something fundamental. Any advice?

  5. Release texture object on the device so that OpenGL can access it again for display.
    cudaGraphicsUnMapResources(1,&texobj,0);

  6. And when I am done with the program, I won’t forget to unregister the texture object.
    cudaGraphicsUnregisterResource(texobj_CUDA);

Additional questions:

  1. When one calls cudaGraphicsSubResourceGetMappedArray, it is supposed to return a pointer to a cudaArray. The Programming Guide in 3.2.8.1 says that it should be possible to use cudaMemcpy2D to write to this array. Why not cudaMemcpy2DToArray (or cudaMemcpyToArray)? cudaMemcpy2D requires the pitch as an input. How do I determine the pitch for a cudaArray?
  2. Is it possible to check the parameters of the cudaGraphicsResource returned by cudaGraphicsMapResources or cudaGraphicsGLRegisterImage? (Bascially, is it the size and data type I think it is?)

Again, thanks for bearing with me.

Internal data type should be changed to GL_RGBA32F, i assume. GL_RGBA means 8bit per channel.

Looks alright for me. Did you check whether the mapped cuArray is zero?

In case you are using a 257/258 display driver, you might want to try the older 197.

The Cuda-OpenGl-interop does not work with the newer drivers, at least for my application.

See this thread: http://forums.nvidia.com/index.php?showtopic=171631

Thank you! Thank you!

When the line reads:

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, width, height, 0, GL_RGBA, GL_FLOAT, NULL);

it works just fine.