in our CUDA-Enabled application, we use OpenGL VBO to pass data between CUDA kernel and OpenGL rendering. with CUDA2.2 driver, the following code run smoothly.
... unsigned int vbo; unsigned int size = 1024; glGenBuffersARB(1, &vbo); //claim vbo size, but pass null data. glBufferDataARB(GL_ARRAY_BUFFER_ARB, size , NULL, GL_DYNAMIC_DRAW_ARB); //register vbo to CUDA first cudaGLRegisterBufferObject(vbo); ... //fill vbo with real data later on glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo); glBufferSubDataARB(GL_ARRAY_BUFFER_ARB, realdata); glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0); ...
But it did introduce potential problem under CUDA2.3 and corrupted the following calls to cudaMalloc/cudaMemcpy.
we fixed this problem by moving glBufferSubDataARB before cudaGLRegisterBufferObject.
CUDA Progamming guide hasn’t metioned the order of these operations, so I assume this is CUDA internal bug.