Just wondering how to access triangle memory most optimally in CUDA. I thought about having 3 interp VB’s and an index buffer however I don’t know how to access them in GL for rendering, I also thought about having a grid of length [ num_vertices * 3 ] and width [ 3 ] to access the triangles with the threadIdx.y value to load them into shared memory, however I think a width of 3 threads wouldn’t be optimal, so perhaps VB size in the x dimension and 1 in the other two (vb_size,1,1), however I still don’t know how to display the 3 VB’s, is there a better way ?
That depends very much on the compute capability of your card. If it is 2.0 or higher, you don’t need to do anything as consecutive misaligned accesses are well handled by the cache.
From your answer I can’t tell if you are describing the Tex chip cache. I am aware that texture access does not require the same safety checking because the tex3d calls were cached, I was not sure whether interop Vertex Buffer’s were also stored in the Texture memory. If that is the case then maybe using arrays instead of Vertex Buffer’s could offer performance improvements for programs with a lot of non-graphical Kernels, with perhaps a simple Kernel for copying the results of a computation to VB at the end. Am I correct in this hypothesis ?