Problem using multiple Buffer Objects in CUDA

Hello all!

Sorry, the message is a bit long…

Somedays ago I posted a message asking about the better approach to be used when you want to make a big array of constants available for both CUDA and GLSL (fragment) programs. One approach I was considering was the use of Bindable Uniform Buffer Objects. They seem perfect for my task… but, unfortunately, they are limited in size (in this case, 64KB).

 The second possible approach was the 1D Texture Buffer Objects. I was a bit reluctant about this because I think (I still dont know) that accessing such buffers means going through all texture fetch machinery (buffers, filtering, texture cache... etc.). And I thought that this could slow down things. But I, without something better in my mind, went for it... and used them. 

Now I got into a real serious problem: not the one related to choices between Buffer Objects, but the one related to “program working”.

The current scenario is the following: my CPU program writes some data to a Vertex Buffer Object (aka VBO) and to a Texture Buffer Object (aka TBO). After, the CPU sends those buffers to the GPU. A CUDA program reads data from both Buffer Objects (the VBO and the TBO), make some calculations and write the results only to the VBO.

 It works, nicely, only if I use few equations. The problem is that I have several cascading equations (cascading I mean: the results from one feeds the next one, and so on). If I use, for example, only the first 3 equations, it works. Now, if I put all equations (a chain of ~ 7 equations), it stops working. With "stops working" I mean: the CUDA program doesn not HALT, it just seems that the CUDA program is not called: the input data (VBO) is not changed by the CUDA program. 

The structure of my code is right bellow:

=== CPU CODE ===

[codebox] // Sets up the VBO ////////////////

glGenBuffers(1,&vbo);

 glBindBuffer(GL_ARRAY_BUFFER, vbo);

 glBufferData(GL_ARRAY_BUFFER, data1_bytesize + data2_bytesize, NULL,  GL_DYNAMIC_DRAW);

 glBufferSubData(GL_ARRAY_BUFFER, 0, data1_bytesize, data1_ptr);

 glBufferSubData(GL_ARRAY_BUFFER, data1_bytesize, data2_bytesize, data2_ptr);

 glBindBuffer(GL_ARRAY_BUFFER, 0);

 register_buffer_object_CUDA(vbo);

// Sets up the TBO ////////////////

glGenBuffers(1,&tbo);

 glBindBuffer(GL_TEXTURE_BUFFER_EXT, tbo);

 glBufferData(GL_TEXTURE_BUFFER_EXT, tbo_buffer_bytesize, tbo_data_ptr, GL_STATIC_READ);

 glBindBuffer(GL_TEXTURE_BUFFER_EXT, 0);

 register_buffer_object_CUDA(tbo);

// calls the CUDA program ////////////////

 float4* tbo_ptr;

 map_buffer_object_CUDA((void**)&tbo_ptr, tbo);

float3* vbo_ptr;

 map_buffer_object_CUDA((void**)&vbo_ptr, vbo);

at this point I call the cuda program <<<<<

unmap_buffer_object_CUDA(vbo);

 unmap_buffer_object_CUDA(tbo);

[/codebox]

=== CUDA CODE ===

[codebox]

 var1 = function1(vbo, tbo);

 var2 = function1(var1, vbo, tbo);

 var3 = function1(var2, vbo, tbo);

 var4 = function1(var3, vbo, tbo);

 var5 = function1(var4, vbo, tbo);

 var6 = function1(var5, vbo, tbo);

 var7 = function1(var6, vbo, tbo);

vbo = var7;

[/codebox]

So… if my CUDA program is just:

[codebox]

 var1 = function1(vbo, tbo);

 var2 = function1(var1, vbo, tbo);

 var3 = function1(var2, vbo, tbo);

vbo = var7;

[/codebox]

it works… Now, if the program is like:

[codebox]

 var1 = function1(vbo, tbo);

 var2 = function1(var1, vbo, tbo);

 var3 = function1(var2, vbo, tbo);

 var4 = function1(var3, vbo, tbo);

 var5 = function1(var4, vbo, tbo);

 var6 = function1(var5, vbo, tbo);

 var7 = function1(var6, vbo, tbo);

vbo = var7;

[/codebox]

it doesnt work any more. Again… the program does not halt, it just seems that the CUDA program is not fired: my output VBO is exactly equal to the input VBO!!

I made a test: I just created a hard coded array inside the CUDA program, with the same values I have in the TBO, and I used only this hard coded array for the calculations: everything worked PERFECTLY!!!

It seems to be some issue related to the use of multiple Buffer Objects in CUDA…

Just in case: my video card is a GeForce 8400 (notebook). I did not try this code with other video cards.

Am I missing something? Do I have to “unlock/register/map” something before using multiple buffers in CUDA?

Just a reminder (in case someone suggests me to use a differente approach): I am using TBO because its data, just after the CUDA step, will be read again by a GLSL Fragment program (the TBO data is allways READ-ONLY… in CUDA or in GLSL Fragment program)!!

Again, thanks for any help!!!

 Capagot

Is threre any chance for same id for VBO, and TBO, then cudaRegister will fail.

Smat.

Hi Smart!

I checked the IDs: vbo =1 and tbo =2.

And even with different IDs, the program fails if I use tbo, and works if I use the hardcoded array.

Im running this on Linux: Ubuntu Gutsy 8.04

The driver I am using is: NVIDIA-Linux-x86-177.73-pkg1.run