cudaMemcpy issue

I believe I’ve run into a bug with cudaMemcpy using Cuda 2.3. I’ll need to do some work to make a reproducer and log it but I thought I’d post the problem here to see if I’m just missing something.

Besides Cuda my app uses OpenGL / Glsl currently only runs on XP 32bit We are currently developing on 2Gb GeForce GTX285.

I use the following few types of memory w. interoperability schemes:
*There are 3 large chunks of graphics memory setup as PBO’s. Combined these 3 chunks are limited to 1.5Gb. As far as the bug is concerned this data is static and doesn’t change.
*There are several small VBO objects created and destroyed
*There are hundreds of RAM to cuda transfers using cudaMemcpy and either stack based or heap based objects

We have dozens of Cuda kernels at this point and they are run hundreds of times per execution cycle before the bug is hit.

In a nutshell the problem seems to be that after hundreds of succesful calls cudaMemcpy will fail to copy data from the CPU to the GPU (but consistently in the precisly the same spot on the same machine with the same data set). The error is always “invalid argument”. This occurs right after I create and register a new vbo object (Of which I’ve already created dozens before the bug occues). I’ve tried and assumed all sorts of things. The long and short of it is that the only thing that matters is if I copy the small bit of CPU data to pagelocked memory first before calling cudaMemcpy… then everything works.

My workaround for now is to use pagelocked memory for all my RAM to Cuda transfers when I use cudaMemcpy. I’m not using the async versions so I thought this wasn’t neccesary…

Does this ring any bells?