OpenGL interop: Reading from and writing to surface

Clarence · December 14, 2015, 6:37pm

Hey Guys,

I’ve been trying to use open GL and Surface references to read and write to a 3Dtexture via the surface reference API for a long time now.
I managed to find examples online, like:

or this post from a while ago:

but nothing seems to work for me.
After changing the the example code from the first link: https://drive.google.com/file/d/0B61Vxw4WozyLYmE4ZjgyNTgtZDExZS00M2E1LTljOTAtMjYxMzg4ODQ0Nzc1/view
like stated in the second link, it compiles but apparently doesnt write to the cuda Array.

here is part of my code:

glGenTextures(1, &texID);
	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);

		glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA32F, textureDim.x, textureDim.y, textureDim.z, 0, GL_RGBA, GL_FLOAT, NULL);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	CHECK_ERROR_GL();

	// register Image (texture) to CUDA Resource
	CHECK_CUDA(cudaGraphicsGLRegisterImage(&cuda_image_resource, 
                                           texID, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));

	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));

Kernel:

__global__
void kernel(cudaSurfaceObject_t surfaceWrite,dim3 texture_dim)
{
	int x = blockIdx.x*blockDim.x + threadIdx.x;
	int y = blockIdx.y*blockDim.y + threadIdx.y;
	int z = blockIdx.z*blockDim.z + threadIdx.z;

	if(x >= texture_dim.x || y >= texture_dim.y || z >= texture_dim.z)
	{
		return;
	}

	float4 element = make_float4(0.2f, 1.0f, 1.0f, 1.0f);
	surf3Dwrite(element, surfaceWrite, x*sizeof(float4), y, z);
}

extern "C"
	void launch_kernel(cudaArray_t *writeTo, dim3 texture_dim)
{
	dim3 block_dim(8, 8, 8);
	dim3 grid_dim(texture_dim.x/block_dim.x, texture_dim.y/block_dim.y, texture_dim.z/block_dim.z);
 

	   struct cudaResourceDesc description;
   memset(&description, 0, sizeof(description));
   description.resType = cudaResourceTypeArray;
   description.res.array.array = *writeTo;

   cudaSurfaceObject_t write;
   cudaCreateSurfaceObject(&write, &description);


   kernel<<< grid_dim, block_dim >>>(write,texture_dim);
   cudaDestroySurfaceObject(write);

}

This function checks whether or not writing was successful:

int numElements = textureDim.x*textureDim.y*textureDim.z*4;
	float *data = new float[numElements];

	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glGetTexImage(GL_TEXTURE_3D, 0, GL_RGBA32F, GL_FLOAT, data);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	bool fail = false;
	for(int i = 0; i < numElements && !fail; i++)
	{
		if(data[i] != 1.0f)
		{
			printf("%f\n",data[i] );

			cerr << "Not 1.0f, failed writing to texture" << endl;
			fail = true;
		}
	}
	if(!fail)
	{
		cerr << "All Elements == 1.0f, texture write successful" << endl;
	}

	delete [] data;

Best regards,
Glenn

Robert_Crovella · December 14, 2015, 8:49pm

I would suggest

providing your whole code
explaining more clearly what you mean by “apparently doesnt write to the cuda Array.” Do you mean your check function failed? If so, what was the output?
are you writing 4 floats, one of which is 0.2f:

float4 element = make_float4(0.2f, 1.0f, 1.0f, 1.0f);

and then later checking to see if all floats are 1.0f ?

if(data[i] != 1.0f)

Clarence · December 15, 2015, 10:47am

Hello,

thanks for your answer.
The output is zero, disregarding the number I put into the surfacewrite.
The 0.2f is an artefact of all the experimentation I did so far.

Here is my whole code:

#ifdef _WIN32
#  define WINDOWS_LEAN_AND_MEAN
#  define NOMINMAX
#  include <windows.h>
#endif

#include <iostream>

#include <cuda_runtime.h>
#include <cuda_gl_interop.h>

//Global scope surface to bind to
surface<void, cudaSurfaceType3D> surfaceWrite;

///////////////////////////////////////////////////////////////////////////////
//! Simple kernel to just write something to the texture
///////////////////////////////////////////////////////////////////////////////
__global__
void kernel(cudaSurfaceObject_t surfaceWrite,dim3 texture_dim)
{
	int x = blockIdx.x*blockDim.x + threadIdx.x;
	int y = blockIdx.y*blockDim.y + threadIdx.y;
	int z = blockIdx.z*blockDim.z + threadIdx.z;

	if(x >= texture_dim.x || y >= texture_dim.y || z >= texture_dim.z)
	{
		return;
	}

	float4 element = make_float4(1.0f, 1.0f, 1.0f, 1.0f);
	surf3Dwrite(element, surfaceWrite, x*sizeof(float4), y, z);
}

extern "C"
	void launch_kernel(cudaArray_t *writeTo, dim3 texture_dim)
{
	dim3 block_dim(8, 8, 8);
	dim3 grid_dim(texture_dim.x/block_dim.x, texture_dim.y/block_dim.y, texture_dim.z/block_dim.z);
 

	struct cudaResourceDesc description;
   memset(&description, 0, sizeof(description));
   description.resType = cudaResourceTypeArray;
   description.res.array.array = *writeTo;

   cudaSurfaceObject_t write;
   cudaCreateSurfaceObject(&write, &description);


   kernel<<< grid_dim, block_dim >>>(write,texture_dim);
   cudaDestroySurfaceObject(write);

}

#ifdef _WIN32
#  define WINDOWS_LEAN_AND_MEAN
#  define NOMINMAX
#  include <windows.h>
#endif


// OpenGL Graphics includes
#include <GL/glew.h>
#if defined (__APPLE__) || defined(MACOSX)
#include <GLUT/glut.h>
#else
#include <GL/freeglut.h>
#endif

// CUDA utilities and system includes
#include <cuda_runtime.h>
#include <cuda_gl_interop.h>

// includes, system
#include <iostream>

using namespace std;

#define REFRESH_DELAY	  10 //ms


////////////////////////////////////////////////////////////////////////////////
// constants
const unsigned int window_width  = 512;
const unsigned int window_height = 512;


GLuint texID;
cudaGraphicsResource *cuda_image_resource;
cudaArray            *cuda_image_array;
cudaArray_t writeArray;

dim3 textureDim(128, 128, 128);


extern "C" 
void launch_kernel( cudaArray_t *writeTo, dim3 texture_dim);

////////////////////////////////////////////////////////////////////////////////
// declaration, forward


bool initGL(int *argc, char** argv);
void initCUDA();

void checkTex();

void runCudaTest();

// rendering callbacks
void display();
void keyboard(unsigned char key, int x, int y);
void timerEvent(int value);

void CHECK_CUDA(cudaError_t err) {
    if(err != cudaSuccess) {
        std::cerr << "Error: " << cudaGetErrorString(err) << std::endl;
        exit(-1);
    }
}  

void CHECK_ERROR_GL() {
    GLenum err = glGetError();
    if(err != GL_NO_ERROR) {
        std::cerr << "GL Error: " << gluErrorString(err) << std::endl;
        exit(-1);
    }
}


void display(){
}


////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char** argv)
{
	initGL(&argc, argv);
	initCUDA();

	glGenTextures(1, &texID);
	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST        );
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_S,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);
		glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_WRAP_T,     GL_CLAMP_TO_BORDER);

		glTexImage3D(GL_TEXTURE_3D, 0, GL_RGBA32F, textureDim.x, textureDim.y, textureDim.z, 0, GL_RGBA, GL_FLOAT, NULL);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	CHECK_ERROR_GL();

	// register Image (texture) to CUDA Resource
	CHECK_CUDA(cudaGraphicsGLRegisterImage(&cuda_image_resource, 
                                           texID, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));

	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));

	checkTex();

	CHECK_CUDA(cudaGraphicsUnregisterResource(cuda_image_resource));

	glDeleteTextures(1, &texID);
	
	CHECK_CUDA(cudaDeviceReset());
}


////////////////////////////////////////////////////////////////////////////////
//! Initialize GL
////////////////////////////////////////////////////////////////////////////////
bool initGL(int *argc, char **argv)
{
    glutInit(argc, argv);
    glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
    glutInitWindowSize(window_width, window_height);
    glutCreateWindow("CUDA GL 3D Texture Surface Write");
    glutDisplayFunc(display);
    glutKeyboardFunc(keyboard);
	glutTimerFunc(REFRESH_DELAY, timerEvent,0);

	// initialize necessary OpenGL extensions
    glewInit();
    if (! glewIsSupported("GL_VERSION_2_0 "))
	{
        fprintf(stderr, "ERROR: Support for necessary OpenGL extensions missing.");
        fflush(stderr);
        return false;
    }

    // default initialization
    glClearColor(0.0, 0.0, 0.0, 1.0);
    glDisable(GL_DEPTH_TEST);

    // viewport
    glViewport(0, 0, window_width, window_height);

    // projection
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluPerspective(60.0, (GLfloat)window_width / (GLfloat) window_height, 0.1, 10.0);

    CHECK_ERROR_GL();

    return true;
}

void initCUDA()
{
	int deviceCount;
	CHECK_CUDA(cudaGetDeviceCount(&deviceCount));

	cerr << "CUDA device count: " << deviceCount << endl;
	int device = 0; //SELECT GPU HERE
	cerr << "Selecting GPU: " << device << endl;
	CHECK_CUDA(cudaSetDevice(device));
	CHECK_CUDA(cudaGLSetGLDevice( device ));
}


////////////////////////////////////////////////////////////////////////////////
//! Run the CUDA test
////////////////////////////////////////////////////////////////////////////////
void runCudaTest()
{
	// map CUDA resource
	CHECK_CUDA(cudaGraphicsMapResources(1, &cuda_image_resource, 0));
	{
		
		//Get mapped array
		CHECK_CUDA(cudaGraphicsSubResourceGetMappedArray(&writeArray, cuda_image_resource, 0, 0));
		launch_kernel(&writeArray, textureDim);
	}
	CHECK_CUDA(cudaGraphicsUnmapResources(1, &cuda_image_resource, 0));
}


////////////////////////////////////////////////////////////////////////////////
//! Check Texture
////////////////////////////////////////////////////////////////////////////////
void checkTex()
{
	int numElements = textureDim.x*textureDim.y*textureDim.z*4;
	float *data = new float[numElements];

	glBindTexture(GL_TEXTURE_3D, texID);
	{
		glGetTexImage(GL_TEXTURE_3D, 0, GL_RGBA32F, GL_FLOAT, data);
	}
	glBindTexture(GL_TEXTURE_3D, 0);

	bool fail = false;
	for(int i = 0; i < numElements && !fail; i++)
	{
		if(data[i] != 1.0f)
		{

			cerr << "Not 1.0f, failed writing to texture" << endl;
			fail = true;
		}
	}
	if(!fail)
	{
		cerr << "All Elements == 1.0f, texture write successful" << endl;
	}

	delete [] data;
}

////////////////////////////////////////////////////////////////////////////////
//! Display callback
////////////////////////////////////////////////////////////////////////////////

void timerEvent(int value)
{
    glutPostRedisplay();
	glutTimerFunc(REFRESH_DELAY, timerEvent,0);
}


////////////////////////////////////////////////////////////////////////////////
//! Keyboard events handler
////////////////////////////////////////////////////////////////////////////////
void keyboard(unsigned char key, int /*x*/, int /*y*/)
{
    switch(key) {
    case(27) :
        exit(0);
        break;
    }
}

Robert_Crovella · December 15, 2015, 5:50pm

You’ve mashed surface objects and surface references together. You don’t want to do that. It should be sufficient to use one or the other.

What happens if you just use the code as-is from the google drive link? (which uses surface references)

Clarence · December 15, 2015, 8:24pm

It still just gives me zeros.
It also seems weird to me, but that’s what the guy states in the second link, saying that it is necessary to use the surface object from cuda 5 on:

"The bad news…

Unfortunately, if you try to write to a globally scoped CUDA surface from a device-side launched kernel (i.e. a dynamic kernel), nothing will happen. You’ll scratch your head and wonder why code that works perfectly fine when launched from the host-side, fails silently when launched device-side.

I only discovered the reason when I decided to read, word for word, the CUDA Dynamic Parallelism Programming Guide. On page 14, in the “Textures & Surfaces” section is this note:

NOTE: The device runtime does not support legacy module-scope (i.e. Fermi-style)
textures and surfaces within a kernel launched from the device. Module-scope (legacy)
textures may be created from the host and used in device code as for any kernel, but
may only be used by a top-level kernel (i.e. the one which is launched from the host)."

Best Regards,
Glenn

Robert_Crovella · December 15, 2015, 10:04pm

No, he’s not stating you must surface objects from CUDA 5 on. He’s stating you must use surface objects if you want to access the surface from a device-launched kernel. That does not apply to your case, or your code, so it is still acceptable to use a surface reference. (Note that immediately prior to the section you have quoted, there is this statement: “The old methods still work,”)

Furthermore, in your code, you have mashed together surface object usage with surface reference usage. This:

//Global scope surface to bind to
surface<void, cudaSurfaceType3D> surfaceWrite;

is a surface reference. If you were using strictly surface objects, no such global reference would be needed, and the fact that you’ve given it the same name as the kernel parameter and the name of the surface objects you’re trying to use, it’s quite confusing.

However I’ve taken the original code and removed the cutil stuff, and it is also not working for me. At the moment I’m stumped. If I discover anything further, I’ll report back here.

Clarence · December 15, 2015, 10:28pm

Apparently the devil is hidden in the details, but I don’t quite get why that is not the case?
Intuitively I would say that I’m accessing the surface via a device-launched kernel.

Robert_Crovella · December 16, 2015, 1:06am

You’ll need to better understand CUDA dynamic parallelism. All kernels run on the device, of course.

If the kernel is launched from host code, it is a host-launched kernel. All of the kernels we have been discussing in any of the codes I’ve seen connected to this thread are host-launched.

If the kernel is launched from device code (i.e. the kernel launch occurs from code executing in a kernel) then it is a device-launched kernel.

Clarence · December 16, 2015, 11:07am

Okay, I’m no native probabyly that’s why i didn’t quite get the exact meaning.

A possible error might be due to the type the texture expects the number to be in.
Similar to what he wrote:
“The texture is generated as a GL_RGBA texture containing GL_FLOAT values. That is why I tried to write a float4 value into the texture. It turns out that the OpenGL texture seems to assume 4 byte for each color so writing 4 floats (or 16 bytes) into the texture is just wrong. Although i assumed the texture to be semantically “4 floats each color channel” i had to write a uchar4 (4 byte) to each pixel (see code below).”

In this case we are using GL_RGBA32F, so it should be fine. Also I’ve tried using GL_RGBA and uchar4 but it still gives me zeros.

Topic		Replies	Views
Reading and Writing OpenGL Textures with Cuda CUDA Programming and Performance	24	24916	June 26, 2023
Passing source pointer from OpenGL texture to cuda kernel? CUDA Programming and Performance	2	3305	October 19, 2017
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2865	April 28, 2018
CUDA Interop for depth component, work-around. CUDA Programming and Performance	0	1842	December 21, 2014
CUDA and OpenGL - Beginner question CUDA Programming and Performance	15	4463	March 6, 2015
Access Violation caused by "cudaGraphicsMapResources" CUDA Programming and Performance	5	1818	September 20, 2018
render to Texture help needed I need help with rendering with cuda to a OPenGL usable texture CUDA Programming and Performance	9	2365	September 28, 2010
QGLWidget + OpenGL + CUDA CUDA Programming and Performance	3	11355	October 20, 2010
CUDA very slow performance CUDA Programming and Performance	21	16698	March 6, 2020
OpenGL interoperability issue with 1 channel float texture. CUDA Programming and Performance	2	2724	September 15, 2010

OpenGL interop: Reading from and writing to surface

Related topics