(this is a slightly updated post originally posted in the developer forums: http://developer.nvidia.com/forums/index.php?showtopic=4010)
hi nvidia users
i observe a strange difference in the runtime behavior of my app (using cuda 2.3) in windows and linux (same machine, a macbookpro, geforce 9600M). consider the following snippet, where i transfer my previously rendered-to renderbuffer data to CUDA:
[codebox]
// setup: create FBO with two renderbuffers, render to texture, etc
// A
glReadBuffer(GL_COLOR_ATTACHMENT0);
glBindBuffer(GL_PIXEL_PACK_BUFFER, mPBO[0]);
glReadPixels(0, 0, mOSWidth, mOSHeight, GL_BGRA, GL_FLOAT, 0); // NOTE: in linux GL_RGBA is necessary for fast pixel reads, in vista, both formats are slow
cutilSafeCall(cudaGLMapBufferObject((void**)&mCudaDevStartPixels, mPBO[0]));
glReadBuffer(GL_COLOR_ATTACHMENT1);
glBindBuffer(GL_PIXEL_PACK_BUFFER, mPBO[1]);
glReadPixels(0, 0, mOSWidth, mOSHeight, GL_BGRA, GL_FLOAT, 0);
cutilSafeCall(cudaGLMapBufferObject((void**)&mCudaDevStartSymbols, mPBO[1]));
// B
cudaCall( … )
// C
cutilSafeCall(cudaGLUnmapBufferObject(mPBO[0]));
cutilSafeCall(cudaGLUnmapBufferObject(mPBO[1]));
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
// process cuda result etc…
[/codebox]
in linux, the code between lines A and B takes about 0.1ms, in windows (vista64) it takes ~20ms (!). in contrast, the code between B and C takes the same amount of time on both plattforms.
i use the binary nvidia linux driver 190.42 and windows driver 195.62.
any ideas? is it an issue with my code or with the driver?
thanks a lot & best wishes for 2010,
simon