cudaGraphicsResourceGetMappedPointer returns "unknown error"

I am creating an OpenGL texture like this:

glGenTextures( 1, &board );

glBindTexture( GL_TEXTURE_2D, board );

glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST );

glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST );

glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, NULL);

I get a handle at <i>board</i> so I assume the texture’s been successfully created. I want to share this texture with CUDA so I register and map the resource:

cudaGLSetGLDevice(0);

cudaGraphicsGLRegisterImage( &boardImage, boardTex, GL_TEXTURE_2D, cudaGraphicsMapFlagsNone );

cudaGraphicsMapResources( 1, &boardImage, 0 );

Then I try to get the mapped pointer like this:

float4* mappedPointer;

size_t mappedSize;

cudaGraphicsResourceGetMappedPointer( (void**)&mappedPointer, &mappedSize, boardImage );

Unfortunately this call returns an error and refuses to work. I made sure the texture wasn’t bound in OpenGL context just in case. Still not working. <i>cudaGetErrorString</i> yields “unknown error” so I’m pretty stuck here. I’d appreciate any ideas.

Okay, my bad. It seems cudaGraphicsResourceGetMappedPointer works only for buffers, and textures must use cudaGraphicsSubResourceGetMappedArray. Now it’s working fine.

Unfortunately I can’t get my head around cudaArrays. I’m fine reading, but I also need to update its contents. According to this thread Writing to cuda array in kernel? it’s possible, but it utterly fails when using sizeof(float) size for the cudaMemcpyToArray. It works using byte-sized memcpy, but apparently it has no effect in the texture. Any ideas?

I might end up using PBOs if I can’t get this working but I’m concerned at performance and AFAIK the memcpy method should be faster, right?