CUDA and OpenGL data transfer

levicki · September 29, 2007, 3:37pm

Can someone from NVIDIA please explain the most efficient way of transferring data from pinned host memory to a 3D OpenGL texture using CUDA?

Texture is in the following format (w x h x d):
glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA8, w, h, d, 0, GL_RGBA, GL_BYTE, 0);

Please help.

levicki · October 1, 2007, 6:23pm

Is this forum dead?

prkipfer · October 2, 2007, 9:09am

No, but you should have read the manual carefully:

3D textures are not supported (current 1.0)
CUDA and gfx APIs are kept in different contexts → no data sharing
see manual section about gfx interop

In summary, it does not make sense to upload a 3D texture to OpenGL using CUDA. Use the PBO transfer instead.

Peter

levicki · October 2, 2007, 9:46am

Well, after I saw a bandwidthBench it made sense to me because 2.5GB/sec I am getting with CUDA transfers is considerably faster than 850 MB/sec I am getting using OpenGL. That is why I asked if there is a way.

Exactly. I am trying to do it but I can’t make it work with 3D textures. Basically I modified Dominik’s tutorial to work with 3D textures but it doesn’t work as it should:

// w = width, h = height, z goes from 0 to d (depth)

glTexSubImage3D(GL_TEXTURE_3D, 0, 0, 0, z, w, h, 1, GL_RGBA, GL_UNSIGNED_BYTE, 0);

It should source from PBO buffer (which I am updating prior to this call) to the proper position in the 3D texture but it always overwrites at z=0, in other words it ignores z offset completely. Here is the code I am having problems with, any advice?

#define GLEW_STATIC

#define GLUT_STATIC_LIB

#pragma comment(lib, "advapi32.lib")

#pragma comment(lib, "glew32s.lib")

#pragma comment(lib, "glutstatic.lib")

#include <stdio.h>

#include <windows.h>

#include <GL/glew.h>

#include <GL/glut.h>

#define valloc(size) VirtualAlloc(NULL, (size), MEM_COMMIT, PAGE_READWRITE)

#define vfree(ptr)   VirtualFree(ptr, 0, MEM_RELEASE)

static DWORD CPUFrequency(void)

{

   DWORD freq;

   HKEY hKey;

   const char *key = "HARDWARE\DESCRIPTION\System\CentralProcessor\0";

   DWORD buflen = 4;

   RegOpenKeyExA(HKEY_LOCAL_MACHINE, key, 0, KEY_READ, &hKey);

   RegQueryValueExA(hKey, "~Mhz", NULL, NULL, (LPBYTE)&freq, &buflen);

   RegCloseKey(hKey);

   return freq;

}

static __declspec(naked) unsigned __int64 ReadTSC(void)

{

   __asm   {

      rdtsc

      ret

   }

}

int main(int argc, char *argv[])

{

   const int w = 512, h = 512, d = 256;

   int   frame_size = w * h * sizeof(float);

   int   data_size = frame_size * d;

  glutInit(&argc, argv);

   glutCreateWindow("STREAMING TUTORIAL");

   glewInit();

  glMatrixMode(GL_PROJECTION);

   glLoadIdentity();

   glOrtho(0, w, 0, h, -1, 1);

   glMatrixMode(GL_MODELVIEW);

   glLoadIdentity();

   glViewport(0, 0, w, h);

  float   *data1 = (float*)valloc(data_size);

  GLuint   texture3D;

  glGenTextures(1, &texture3D);

   glBindTexture(GL_TEXTURE_3D, texture3D);

   glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_BORDER);

   glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_BORDER);

   glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_R, GL_CLAMP_TO_BORDER);

   glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);

   glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);

   glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA8, w, h, d, 0, GL_RGBA, GL_BYTE, 0);

  GLuint   buffer;

  glGenBuffers(1, &buffer);

  glFinish();

  unsigned __int64   t0, t1;

   double         tt, freq = CPUFrequency();

  t0 = ReadTSC();

   for (int z = 0; z < d; z++) {

      glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, buffer);

      glBufferData(GL_PIXEL_UNPACK_BUFFER_ARB, frame_size, NULL, GL_STREAM_DRAW);

      unsigned char *data_ptr = (unsigned char *)data1 + z * (frame_size);

      float *mem = (float*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY);

      if (mem == NULL) {

         DebugBreak();

      }

      memcpy(mem, data_ptr, frame_size);

      glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER_ARB);

      glTexSubImage3D(GL_TEXTURE_3D, 0, 0, 0, z, w, h, 1, GL_RGBA, GL_UNSIGNED_BYTE, 0);

      glBindBuffer(GL_PIXEL_UNPACK_BUFFER_ARB, 0);

   }

   glFinish();

   t1 = ReadTSC();

  tt = (double)(t1 - t0) / freq;

   printf("2D = %.3f ms, %.2f MB/sec\n", tt / 1000.0, (double)data_size / tt);

  glDeleteBuffers(1, &buffer);

   glDeleteTextures(1, &texture3D);

  vfree(data1);

  return 0;

}

What am I doing wrong?

levicki · October 2, 2007, 1:30pm

Someone please help, this is kind of urgent. Thanks in advance.

levicki · October 3, 2007, 2:25pm

Pretty please?
External Image

bart_k · October 3, 2007, 3:09pm

It seems your question has turned from CUDA into general shader based GPGPU. So you might have more luck in a general OpenGL forum or one of the forums at GPGPU.org

HTH
Bart

levicki · October 5, 2007, 10:14pm

Yes it changed, I thought that it would be possible to use efficient memory transfer provided by CUDA to fill the 3D texture which I could then use from OpenGL+Cg. Seems that I overestimated CUDA, in its current incarnation it is just another useless language. No support for 3D textures, pfff…

I tried at gpgpu.org but that forum appears dead. Either that or they don’t like my question.

I hoped someone from NVIDIA could help me realize what I am doing wrong. I develop exclusively for their high-end cards.

paulius · October 5, 2007, 11:02pm

If the PBO approach doesn’t give you the performance CUDA can, you can try the following:

1 set up the OpenGL PBO and texture
2 memcopy the data from CPU to GPU memory (allocated via cudaMalloc)
3 register and map the PBO to CUDA
4 launch a CUDA kernel to copy the data from CUDA malloc’ed memory to the mapped PBO
5 unmap the PBO from CUDA
6 update the texture

Step 4 will involve a data movement on the device, which you should be able to avoid going the OpenGL-only PBO path. The bandwidth for device-to-device copy is very high (up to 86GB/s, vs 4GB/s for PCIe), though. At any rate, the significantly lower PBO transfer in OpenGL is either an app or maybe a driver issue.

Paulius

levicki · October 6, 2007, 3:15pm

Problem is that I can’t make PBO work with 3D textures. I get only the first slice updated constantly (z offset is being ignored). Can you help me understand what am I doing wrong?

Topic		Replies	Views
Pass openGL data to CUDA. Question about speed. CUDA Programming and Performance	4	1875	August 22, 2016
The best way to copy OpenGL texture to CUDA CUDA Programming and Performance	6	17597	January 13, 2008
Process texture in CUDA and display result CUDA Programming and Performance	10	2049	August 19, 2010
OpenGL & CUDA CUDA Programming and Performance	12	9848	January 16, 2009
Opengl texture to CUDA Array/texture CUDA Programming and Performance	5	22492	December 28, 2009
display a buffer openGL/cuda question CUDA Programming and Performance	11	8161	May 13, 2008
How to send the CUDA results to OpenGL texture? CUDA Programming and Performance	2	10228	August 17, 2007
bind texture reference to raw linear data or array CUDA Programming and Performance	4	10516	July 4, 2007
CUDA & OpenGL FrameBuffer Object. CUDA Programming and Performance	3	9627	September 8, 2011
OpenGL cuda textures CUDA Programming and Performance	0	5116	June 10, 2009

CUDA and OpenGL data transfer

Related topics