Now that the CUDA <-> OpenGL API is out, I’m adapting my previous code to use it, instead of transfering the data back to the CPU and then to OpenGL.
My question is: Is there any overhead in using a mapped VBO instead of allocating the same memory using cudaMalloc?
To put a little context, the application I’m working on is a generic deformable body simulator, supporting multiple models (mass-springs, FEM, SPH fluids, …), as well as different integration algorithms (explicit Euler, RK4, implicit, …). To do this the system relies on a set of data vectors, holding the current state as well as temporary values. My problem is that I don’t know in advance which vector will contain the final state to be rendered. So I can either:[list=1]
copy the final state to an OpenGL VBO (using a device-to-device memcpy)
allocate all vectors as VBOs (either as separate VBOs or a single large VBO)
change the design so that the final state is always stored in the same vector
Obviously the first solution adds the overhead of one additional copy per frame, while the third might require lots of changes in the code. So the second one would be the easiest and most efficient solution, but I’m not sure if the API would handle it well…
PS: the result of this will be released as open-source in the SOFA simulation framework, hopefully within the next few weeks :)