PBO/glReadPixels/cudaGLMapBufferObject performance difference between vista and linux

(this is a slightly updated post originally posted in the developer forums: http://developer.nvidia.com/forums/index.php?showtopic=4010)

hi nvidia users

i observe a strange difference in the runtime behavior of my app (using cuda 2.3) in windows and linux (same machine, a macbookpro, geforce 9600M). consider the following snippet, where i transfer my previously rendered-to renderbuffer data to CUDA:

[codebox]

// setup: create FBO with two renderbuffers, render to texture, etc

// A

glReadBuffer(GL_COLOR_ATTACHMENT0);

glBindBuffer(GL_PIXEL_PACK_BUFFER, mPBO[0]);

glReadPixels(0, 0, mOSWidth, mOSHeight, GL_BGRA, GL_FLOAT, 0); // NOTE: in linux GL_RGBA is necessary for fast pixel reads, in vista, both formats are slow

cutilSafeCall(cudaGLMapBufferObject((void**)&mCudaDevStartPixels, mPBO[0]));

glReadBuffer(GL_COLOR_ATTACHMENT1);

glBindBuffer(GL_PIXEL_PACK_BUFFER, mPBO[1]);

glReadPixels(0, 0, mOSWidth, mOSHeight, GL_BGRA, GL_FLOAT, 0);

cutilSafeCall(cudaGLMapBufferObject((void**)&mCudaDevStartSymbols, mPBO[1]));

// B

cudaCall( … )

// C

cutilSafeCall(cudaGLUnmapBufferObject(mPBO[0]));

cutilSafeCall(cudaGLUnmapBufferObject(mPBO[1]));

glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);

// process cuda result etc…

[/codebox]

in linux, the code between lines A and B takes about 0.1ms, in windows (vista64) it takes ~20ms (!). in contrast, the code between B and C takes the same amount of time on both plattforms.

i use the binary nvidia linux driver 190.42 and windows driver 195.62.

any ideas? is it an issue with my code or with the driver?

thanks a lot & best wishes for 2010,

simon

the issue is solved.

as described in http://forums.nvidia.com/index.php?s=&…st&p=499019, using cudaGLSetGLDevice instead of cudaSetDevice fixes the slow pixel transfers with vista…

cheers,

simon