BUG in the 64 Bit Driver of CUDA 2.3 pbo texture size of 2048 causes a 20ms delay

1.) PBO texture size of 2048 causes a 15ms delay

Configuration: 64Bit Windows, newest 64Bit Drivers, GTX280

If the PBO size for the screen-texture in my CUDA environment is set to 1024x1024 there is no problem, but if its set to 2048x2048 then there is a 20ms delay by the driver for registering and unregistering the driver.

cudaGLRegisterBufferObject(pbo)
cudaGLUnregisterBufferObject(pbo)

on 32 Bit there is no such error.

2.) GL_STREAM_DRAW which worked well in oder CUDA versions causes a 20ms delay in CUDA 2.3

Configuration: Cuda:2.3, Windows XP 32 Bit, Drivers 6.14.11.9038, GTX260 & GTX285 tested

In older CUDA versions (like 2.1), GL_STREAM_DRAW worked well to create the PBO (glBufferData).
In CUDA 2.3, the parameter needs to be changed to GL_DYNAMIC_COPY

If any of you knows a workaround for the 1st bug that would be very helpful.