CUDA to VBO transfer problems

Stewie · March 9, 2007, 10:46pm

Windows XP, quad core, quadro fx 5600
Using cuda 0.8 driver 97.73 sdk 10

Hi,

I am trying to transfer a CUDA array to a VBO .

I create the VBO vboid and do the following to allocate sz bytes:

   void *p = NULL;               
   glBindBufferARB ( GL_ARRAY_BUFFER_ARB, vboid );
   glBufferDataARB( GL_ARRAY_BUFFER_ARB, sz, p, GL_STREAM_COPY_ARB );
   glBindBufferARB ( GL_ARRAY_BUFFER_ARB, 0 );

Here, I want p, the data the VBO will be initialized with, to be NULL. However, when I try to register the vbo using cudaGLRegisterBufferObject, it crashes.
nvoglnt.dll+0007f1dc()

If I allocate a dummy array for p of size sz and pass it in, the register and the subsequent cudaMemcpy works perfectly. However, I don’t want to have to initialize it, since I am just going to trample it anyway!

Or is there a better way of doing this?

Thanks!

mfatica · March 9, 2007, 11:34pm

It is a known bug (see http://forums.nvidia.com/index.php?showtopic=28830).

The workaround, is as you found out , to allocate a dummy array to pass to glBufferData.

Windows XP, quad core, quadro fx 5600

Using cuda 0.8 driver 97.73 sdk 10

Hi,

I am trying to transfer a CUDA array to a VBO .

I create the VBO vboid and do the following to allocate sz bytes:

void *p = NULL;

glBindBufferARB ( GL_ARRAY_BUFFER_ARB, vboid );

glBufferDataARB( GL_ARRAY_BUFFER_ARB, sz, p, GL_STREAM_COPY_ARB );

glBindBufferARB ( GL_ARRAY_BUFFER_ARB, 0 );

Here, I want p, the data the VBO will be initialized with, to be NULL. However, when I try to register the vbo using cudaGLRegisterBufferObject, it crashes.

nvoglnt.dll+0007f1dc()

If I allocate a dummy array for p of size sz and pass it in, the register and the subsequent cudaMemcpy works perfectly. However, I don’t want to have to initialize it, since I am just going to trample it anyway!

Or is there a better way of doing this?

Thanks!

[snapback]169725[/snapback]

Stewie · March 12, 2007, 9:57pm

Is there something special that needs to be done to sync for the end of the transfer?

As a test I have a VBO that contains positions of an object I am drawing. I xfer it to CUDA memory and then xfer it back to a VBO and then draw using the VBO.

I am registering, mapping, copying, unmapping and then unregistering.

The effect I get is on the first render I get some garbage – it seems to be a stale version of the data.

If I then redraw at a later time, using the same VBO, no extra copies – it gives me the correct result.

Is this a known issue? Am I supposed to call some sort of sync function?

Thanks,
Stewie

Stewie · March 22, 2007, 4:17pm

I rejigged my code to reduce the impact of this workaround and as a result smacked into another problem, which I am now working around again.

What I wanted to do was:

void CalculateSomethingSpectacular( unsigned int outputvboid )

{

      void * p = MapVBOToCUDA( outputvboid );

     CalculateKernel( p );

     UnmapVBOFromCUDA( outputvboid );

}

However, if I did this, future cudaMallocs etc would return cudaError 10201 …

So instead, I need to do the compute into a temporary buffer and then copy it in

void CalculateSomethingSpectacularButSlower( unsigned int outputvboid, void *pTempDeviceBuffer )

{

      CalculateKernel( pTempDeviceBuffer );

     void * p = MapVBOToCUDA( outputvboid );

      cudaMemcpy( p, pTempDeviceBuffer, sz, devicetodevice );

      UnmapVBOFromCUDA( outputvboid );

}

OK, this works, but uses extra memory, requires temporary buffer management, and has an extra memcpy. Yeah it is blindingly fast since it is on the card but it is still unnecessary.

(!) As a side note, I imagine that cudaGLMapBufferObject actually copies the VBO contents to CUDA memory as part of the mapping. It would be nice if the function took an optional parameter “in_bPreserveOriginalData” that could be set to “false” when we don’t really care about the original contents. A post-blur effect would be able to set it to true, so the copy is done, and a kernel that purely generates data, overwriting everything, would just set it to false.

Similarly, the unmap should have such a parameter, so that two things can be done: 1) the caller can signal that although the data was mapped, it never changed and hence does not need to be copied back, and 2) if something failed, there is no point copying bogus data back.

Or is this latter optimization done automatically using some lower level mechanism we don’t see?

Thanks for any comments in advance,

Stewie

Topic		Replies	Views
CUDA & OpenGL FrameBuffer Object. CUDA Programming and Performance	3	9626	September 8, 2011
Problem with cudaGLMapBufferObject CUDA Programming and Performance	4	6159	February 29, 2008
OpenGL & CUDA CUDA Programming and Performance	12	9845	January 16, 2009
display a buffer openGL/cuda question CUDA Programming and Performance	11	8145	May 13, 2008
CUDA + OpenGL Unspecified launch failure in prior launch CUDA Programming and Performance	3	1834	June 7, 2010
OpenGL cuda textures CUDA Programming and Performance	0	5116	June 10, 2009
cudaGraphicsGLRegisterBuffer and unspecified driver error CUDA Programming and Performance	5	4611	September 22, 2011
CUDA -OpenGL Interop CUDA Programming and Performance	0	1577	April 27, 2012
CUDA doesn't perceive VBO data's modification with glMapBuffer() ? CUDA Programming and Performance	2	1264	January 24, 2011
CUDA and OpenGL data transfer CUDA Programming and Performance	9	21287	October 6, 2007

CUDA to VBO transfer problems

Related topics