device memory synchronization between different host threads

Hi, guys,

I got a problem recently. In my original opengl program, everything works fine. The main thread ( host thread A ) binds a vertex array (d_pos) to Vertex Buffer Object for graphic rendering . Now I need to modify the vertex array periodically in another thread ( host thread B ), running at 1000HZ. While the grahic rendering is still done in host thread A at 30HZ.

Since any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread, d_pos cannot be shared between these 2 host threads.

The only solution in my mind currently is to allocate some device memory for the same vertex array (d_pos2), then use the host memory (h_pos) to synchronize d_pos and d_pos2. More specifically, I copy the modified d_pos2 to h_pos every mini-sec and copy the h_pos to d_pos every 30 mini-sec for graphic rendering. But in this way, the performance can be a big problem because of the frequent memory copy between the host and device, especially when the vertex array is quite large in my case.

Any one has any idea about this?