OpenGL interop: Reading from and writing to surface

Hey Guys,

I’ve been trying to use open GL and Surface references to read and write to a 3Dtexture via the surface reference API for a long time now.
I managed to find examples online, like:
http://rauwendaal.net/2011/12/02/writing-to-3d-opengl-textures-in-cuda-4-1-with-3d-surface-writes/
http://rauwendaal.net/2013/04/03/cuda-5-and-opengl-interop/

or this post from a while ago:
https://devtalk.nvidia.com/default/topic/659164/reading-and-writing-opengl-textures-with-cuda/

but nothing seems to work for me.
After changing the the example code from the first link: https://drive.google.com/file/d/0B61Vxw4WozyLYmE4ZjgyNTgtZDExZS00M2E1LTljOTAtMjYxMzg4ODQ0Nzc1/view
like stated in the second link, it compiles but apparently doesnt write to the cuda Array.

here is part of my code:

glGenTextures(1, &texID);
	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);

		glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA32F, textureDim.x, textureDim.y, textureDim.z, 0, GL_RGBA, GL_FLOAT, NULL);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	CHECK_ERROR_GL();

	// register Image (texture) to CUDA Resource
	CHECK_CUDA(cudaGraphicsGLRegisterImage(&cuda_image_resource, 
                                           texID, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));

	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));

Kernel:

__global__
void kernel(cudaSurfaceObject_t surfaceWrite,dim3 texture_dim)
{
	int x = blockIdx.x*blockDim.x + threadIdx.x;
	int y = blockIdx.y*blockDim.y + threadIdx.y;
	int z = blockIdx.z*blockDim.z + threadIdx.z;

	if(x >= texture_dim.x || y >= texture_dim.y || z >= texture_dim.z)
	{
		return;
	}

	float4 element = make_float4(0.2f, 1.0f, 1.0f, 1.0f);
	surf3Dwrite(element, surfaceWrite, x*sizeof(float4), y, z);
}

extern "C"
	void launch_kernel(cudaArray_t *writeTo, dim3 texture_dim)
{
	dim3 block_dim(8, 8, 8);
	dim3 grid_dim(texture_dim.x/block_dim.x, texture_dim.y/block_dim.y, texture_dim.z/block_dim.z);
 

	   struct cudaResourceDesc description;
   memset(&description, 0, sizeof(description));
   description.resType = cudaResourceTypeArray;
   description.res.array.array = *writeTo;

   cudaSurfaceObject_t write;
   cudaCreateSurfaceObject(&write, &description);


   kernel<<< grid_dim, block_dim >>>(write,texture_dim);
   cudaDestroySurfaceObject(write);

}

This function checks whether or not writing was successful:

int numElements = textureDim.x*textureDim.y*textureDim.z*4;
	float *data = new float[numElements];

	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glGetTexImage(GL_TEXTURE_3D, 0, GL_RGBA32F, GL_FLOAT, data);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	bool fail = false;
	for(int i = 0; i < numElements && !fail; i++)
	{
		if(data[i] != 1.0f)
		{
			printf("%f\n",data[i] );

			cerr << "Not 1.0f, failed writing to texture" << endl;
			fail = true;
		}
	}
	if(!fail)
	{
		cerr << "All Elements == 1.0f, texture write successful" << endl;
	}

	delete [] data;

Best regards,
Glenn

I would suggest

  1. providing your whole code
  2. explaining more clearly what you mean by “apparently doesnt write to the cuda Array.” Do you mean your check function failed? If so, what was the output?
  3. are you writing 4 floats, one of which is 0.2f:
float4 element = make_float4(0.2f, 1.0f, 1.0f, 1.0f);

and then later checking to see if all floats are 1.0f ?

if(data[i] != 1.0f)

Hello,

thanks for your answer.
The output is zero, disregarding the number I put into the surfacewrite.
The 0.2f is an artefact of all the experimentation I did so far.

Here is my whole code:

#ifdef _WIN32
#  define WINDOWS_LEAN_AND_MEAN
#  define NOMINMAX
#  include <windows.h>
#endif

#include <iostream>

#include <cuda_runtime.h>
#include <cuda_gl_interop.h>

//Global scope surface to bind to
surface<void, cudaSurfaceType3D> surfaceWrite;

///////////////////////////////////////////////////////////////////////////////
//! Simple kernel to just write something to the texture
///////////////////////////////////////////////////////////////////////////////
__global__
void kernel(cudaSurfaceObject_t surfaceWrite,dim3 texture_dim)
{
	int x = blockIdx.x*blockDim.x + threadIdx.x;
	int y = blockIdx.y*blockDim.y + threadIdx.y;
	int z = blockIdx.z*blockDim.z + threadIdx.z;

	if(x >= texture_dim.x || y >= texture_dim.y || z >= texture_dim.z)
	{
		return;
	}

	float4 element = make_float4(1.0f, 1.0f, 1.0f, 1.0f);
	surf3Dwrite(element, surfaceWrite, x*sizeof(float4), y, z);
}

extern "C"
	void launch_kernel(cudaArray_t *writeTo, dim3 texture_dim)
{
	dim3 block_dim(8, 8, 8);
	dim3 grid_dim(texture_dim.x/block_dim.x, texture_dim.y/block_dim.y, texture_dim.z/block_dim.z);
 

	struct cudaResourceDesc description;
   memset(&description, 0, sizeof(description));
   description.resType = cudaResourceTypeArray;
   description.res.array.array = *writeTo;

   cudaSurfaceObject_t write;
   cudaCreateSurfaceObject(&write, &description);


   kernel<<< grid_dim, block_dim >>>(write,texture_dim);
   cudaDestroySurfaceObject(write);

}
#ifdef _WIN32
#  define WINDOWS_LEAN_AND_MEAN
#  define NOMINMAX
#  include <windows.h>
#endif


// OpenGL Graphics includes
#include <GL/glew.h>
#if defined (__APPLE__) || defined(MACOSX)
#include <GLUT/glut.h>
#else
#include <GL/freeglut.h>
#endif

// CUDA utilities and system includes
#include <cuda_runtime.h>
#include <cuda_gl_interop.h>

// includes, system
#include <iostream>

using namespace std;

#define REFRESH_DELAY	  10 //ms


////////////////////////////////////////////////////////////////////////////////
// constants
const unsigned int window_width  = 512;
const unsigned int window_height = 512;


GLuint texID;
cudaGraphicsResource *cuda_image_resource;
cudaArray            *cuda_image_array;
cudaArray_t writeArray;

dim3 textureDim(128, 128, 128);


extern "C" 
void launch_kernel( cudaArray_t *writeTo, dim3 texture_dim);

////////////////////////////////////////////////////////////////////////////////
// declaration, forward


bool initGL(int *argc, char** argv);
void initCUDA();

void checkTex();

void runCudaTest();

// rendering callbacks
void display();
void keyboard(unsigned char key, int x, int y);
void timerEvent(int value);

void CHECK_CUDA(cudaError_t err) {
    if(err != cudaSuccess) {
        std::cerr << "Error: " << cudaGetErrorString(err) << std::endl;
        exit(-1);
    }
}  

void CHECK_ERROR_GL() {
    GLenum err = glGetError();
    if(err != GL_NO_ERROR) {
        std::cerr << "GL Error: " << gluErrorString(err) << std::endl;
        exit(-1);
    }
}


void display(){
}


////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char** argv)
{
	initGL(&argc, argv);
	initCUDA();

	glGenTextures(1, &texID);
	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);

		glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA32F, textureDim.x, textureDim.y, textureDim.z, 0, GL_RGBA, GL_FLOAT, NULL);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	CHECK_ERROR_GL();

	// register Image (texture) to CUDA Resource
	CHECK_CUDA(cudaGraphicsGLRegisterImage(&cuda_image_resource, 
                                           texID, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));

	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));

	checkTex();

	CHECK_CUDA(cudaGraphicsUnregisterResource(cuda_image_resource));

	glDeleteTextures(1, &texID);
	
	CHECK_CUDA(cudaDeviceReset());
}


////////////////////////////////////////////////////////////////////////////////
//! Initialize GL
////////////////////////////////////////////////////////////////////////////////
bool initGL(int *argc, char **argv)
{
    glutInit(argc, argv);
    glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
    glutInitWindowSize(window_width, window_height);
    glutCreateWindow("CUDA GL 3D Texture Surface Write");
    glutDisplayFunc(display);
    glutKeyboardFunc(keyboard);
	glutTimerFunc(REFRESH_DELAY, timerEvent,0);

	// initialize necessary OpenGL extensions
    glewInit();
    if (! glewIsSupported("GL_VERSION_2_0 "))
	{
        fprintf(stderr, "ERROR: Support for necessary OpenGL extensions missing.");
        fflush(stderr);
        return false;
    }

    // default initialization
    glClearColor(0.0, 0.0, 0.0, 1.0);
    glDisable(GL_DEPTH_TEST);

    // viewport
    glViewport(0, 0, window_width, window_height);

    // projection
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective(60.0, (GLfloat)window_width / (GLfloat) window_height, 0.1, 10.0);

    CHECK_ERROR_GL();

    return true;
}

void initCUDA()
{
	int deviceCount;
	CHECK_CUDA(cudaGetDeviceCount(&deviceCount));

	cerr << "CUDA device count: " << deviceCount << endl;
	int device = 0; //SELECT GPU HERE
	cerr << "Selecting GPU: " << device << endl;
	CHECK_CUDA(cudaSetDevice(device));
	CHECK_CUDA(cudaGLSetGLDevice( device ));
}


////////////////////////////////////////////////////////////////////////////////
//! Run the CUDA test
////////////////////////////////////////////////////////////////////////////////
void runCudaTest()
{
	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));
}


////////////////////////////////////////////////////////////////////////////////
//! Check Texture
////////////////////////////////////////////////////////////////////////////////
void checkTex()
{
	int numElements = textureDim.x*textureDim.y*textureDim.z*4;
	float *data = new float[numElements];

	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glGetTexImage(GL_TEXTURE_3D, 0, GL_RGBA32F, GL_FLOAT, data);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	bool fail = false;
	for(int i = 0; i < numElements && !fail; i++)
	{
		if(data[i] != 1.0f)
		{

			cerr << "Not 1.0f, failed writing to texture" << endl;
			fail = true;
		}
	}
	if(!fail)
	{
		cerr << "All Elements == 1.0f, texture write successful" << endl;
	}

	delete [] data;
}

////////////////////////////////////////////////////////////////////////////////
//! Display callback
////////////////////////////////////////////////////////////////////////////////

void timerEvent(int value)
{
    glutPostRedisplay();
	glutTimerFunc(REFRESH_DELAY, timerEvent,0);
}


////////////////////////////////////////////////////////////////////////////////
//! Keyboard events handler
////////////////////////////////////////////////////////////////////////////////
void keyboard(unsigned char key, int /*x*/, int /*y*/)
{
    switch(key) {
    case(27) :
        exit(0);
        break;
    }
}

You’ve mashed surface objects and surface references together. You don’t want to do that. It should be sufficient to use one or the other.

What happens if you just use the code as-is from the google drive link? (which uses surface references)

It still just gives me zeros.
It also seems weird to me, but that’s what the guy states in the second link, saying that it is necessary to use the surface object from cuda 5 on:

"The bad news…

Unfortunately, if you try to write to a globally scoped CUDA surface from a device-side launched kernel (i.e. a dynamic kernel), nothing will happen. You’ll scratch your head and wonder why code that works perfectly fine when launched from the host-side, fails silently when launched device-side.

I only discovered the reason when I decided to read, word for word, the CUDA Dynamic Parallelism Programming Guide. On page 14, in the “Textures & Surfaces” section is this note:

NOTE: The device runtime does not support legacy module-scope (i.e. Fermi-style)
textures and surfaces within a kernel launched from the device. Module-scope (legacy)
textures may be created from the host and used in device code as for any kernel, but
may only be used by a top-level kernel (i.e. the one which is launched from the host)."

Best Regards,
Glenn

No, he’s not stating you must surface objects from CUDA 5 on. He’s stating you must use surface objects if you want to access the surface from a device-launched kernel. That does not apply to your case, or your code, so it is still acceptable to use a surface reference. (Note that immediately prior to the section you have quoted, there is this statement: “The old methods still work,”)

Furthermore, in your code, you have mashed together surface object usage with surface reference usage. This:

//Global scope surface to bind to
surface<void, cudaSurfaceType3D> surfaceWrite;

is a surface reference. If you were using strictly surface objects, no such global reference would be needed, and the fact that you’ve given it the same name as the kernel parameter and the name of the surface objects you’re trying to use, it’s quite confusing.

However I’ve taken the original code and removed the cutil stuff, and it is also not working for me. At the moment I’m stumped. If I discover anything further, I’ll report back here.

Apparently the devil is hidden in the details, but I don’t quite get why that is not the case?
Intuitively I would say that I’m accessing the surface via a device-launched kernel.

You’ll need to better understand CUDA dynamic parallelism. All kernels run on the device, of course.

If the kernel is launched from host code, it is a host-launched kernel. All of the kernels we have been discussing in any of the codes I’ve seen connected to this thread are host-launched.

If the kernel is launched from device code (i.e. the kernel launch occurs from code executing in a kernel) then it is a device-launched kernel.

Okay, I’m no native probabyly that’s why i didn’t quite get the exact meaning.

A possible error might be due to the type the texture expects the number to be in.
Similar to what he wrote:
“The texture is generated as a GL_RGBA texture containing GL_FLOAT values. That is why I tried to write a float4 value into the texture. It turns out that the OpenGL texture seems to assume 4 byte for each color so writing 4 floats (or 16 bytes) into the texture is just wrong. Although i assumed the texture to be semantically “4 floats each color channel” i had to write a uchar4 (4 byte) to each pixel (see code below).”

In this case we are using GL_RGBA32F, so it should be fine. Also I’ve tried using GL_RGBA and uchar4 but it still gives me zeros.