READ ONLY data for both CUDA and GLSL

Hello all!

I am developing a system that make computations in two steps (both executed in the GPU): the first step is solved with CUDA and the second one is solved with GLSL.
In each step the programs (CUDA and GLSL) will have to read ~100 float elements from the same READ-ONLY data array.
The total size of this array spans over hundreds of KB.

The question is: How can I store this data array into the GPU memory in order to make it available for reading for both CUDA and GLSL programs???

Some thoughts:

  1. If I store this entire array in a VBO, I can read it with CUDA, but I cannot with GLSL.
  2. If I store this entire array in the GPU global memory (with cudaMalloc() and cudaMemcpy() ), I can access this data from CUDA, but not from GLSL… again.
  3. If I store this data in a texture, I SUPPOSE I can access it from CUDA and also from GLSL… BUT texture fetches are slower then simple array or VBO reading.

—> This (number 4) seems promising…

  1. I read about the Bindable Uniform Buffer Objects, and they seemed perfect. I know that these buffers can be read by GLSL as ordinary const arrays (good, fast!), and I SUPPOSE that they can be read by CUDA since they are buffers (like VBOs)… however, this buffers have a upper limit… in my current system (GeForce 8400) they can have at most 64KB… and I suppose that it is not so different for the 8600, 8800… and so on. I suppose that this limit in size is due the fact that these buffers are created in the constant GPU memory. Am I correct? My system allows for up to 12 buffers like this, but this does not solve my problem… I needed a contiguous chunk of memory (it is easier ans faster to index and access).

So… some questions regarding these Bindable Uniform Buffer Objects…:

 Question 1) They are in fact limited in size?
 Question 2) Supposing that these buffers are in fact limited to 64KB: Can I (first) create such a buffer, (second) upload all my array data to the GPU global memory (with cudaMalloc() and cudaMemcpy() ), and (third) copy small portions of the data located in the global memory over these buffer memory footprints?? 
  The problem with my Question 2 is that those Bindable Uniform Buffers are created with OpenGL calls, and I just get an integer ID for the buffer (and not a pointer for the GPU allocated mem.). Is there a way to find these pointers from those buffer IDs generated by OpenGL ???

 Any other solution better then those above mentioned???? :)

 Ok, that's all for a while folks... Any help is welcome!!!!
 [ ]s
 Capagot

The answer to your question depends on exactly what you want to do with the data in OpenGL. It is possible to write to vertex buffer objects in CUDA, and read the vertex data in the vertex shader in GLSL.

You can’t read textures allocated in CUDA from OpenGL.

You can write to a pixel buffer object, and then load a texture from this data, but this requires a copy in video memory.

Instead of bindable uniform buffer objects, you might want to look at this extension, which lets you texture directly from 1D buffer objects in your fragment shader:
http://developer.download.nvidia.com/openg…ffer_object.txt

Thanks for the reply Simon!

I just want to store some float4 constants in a READ-ONLY array to feed some future computations in CUDA and GLSL (Fragment Shader). These constants are not generated on the fly… they already exist and are loaded from a file. This array will have around 1MB of float4 data.

Once I setup this array of constats… I fire a CUDA program and I sent to this program an offset index related to that array. The CUDA program will fetch around 80 float4 values from the array (from the offset index), and make some computations. The CUDA program will not change any value in this array… it will just read it.

In the next step I fire a GLSL Fragment Program. Again, an offset index is sent to this fragment program. This program will also read the same values from this array (the same values previsouly read by the CUDA program), and make some computatios. Again, this GLSL Fragment Program will not change any value in this data array.

In the next frame a new offset index for that array will be generated, and everything happens again, in the same way.

If I could store all that array in the GPU constant memory, and make this array available for reading for both CUDA and GLSL Fragment Program… it would be perfect. On the other hand, I know that 1MB will not fit in the constant memory. So… I though that perhaps I could load all data in the GPU global memory, and copy only portions of this data to the constant memory, on demand. I know how to do that only for CUDA, but I dont know how to do that both for CUDA and GLSL Fragment Programs at the same time.

I will take a look at this extension you told me… it seems to be faster then ordinary texture fetches, since no filtering/mipmaping/etc. is involved.
Thank you!!!
Christian