How do I read OpenGL half float?

I’m testing interoperability with different OpenGl textures and I have no idea how to process 16b floats.

My kernel function

template <class T, int C, class M>
__global__ void invert(cudaSurfaceObject_t s, dim3 texDim, char size, M max) {
	unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
	unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

	if (x < texDim.x && y < texDim.y) {
		T data = surf2Dread<T>(s, x * size, y);
		T inverted;
		switch (C) {
			case 4:
				inverted.w = data.w;
				inverted.z = max - data.z;
			case 2:
				inverted.y = max - data.y;
			case 1:
				inverted.x = max - data.x;
		}		
		surf2Dwrite(inverted, s, x * size, y);
	}
}

And I call it like that:

dim3 texDim(width, height);
dim3 thread(32, 32);
dim3 block(texDim.x / thread.x, texDim.y / thread.y);

// for GL_RGBA32F
invert<float4, 4, float><<< block, thread >>>(surface, texDim, 16, 1.f);

// for GL_RGBA8
invert<uchar4, 4, unsigned char><<< block, thread >>>(surface, texDim, 4, 255);

How to make it work with e.g. GL_RGBA16F? There is no struct like “halffloat4” nor “halffloat” type.

I’ve tried:

invert<float4, 4, float><<< block, thread >>>(surface, texDim, 8, 1.f);

But it only makes weird green lines.

I gave an example program for reading fp16 textures here: https://devtalk.nvidia.com/default/topic/547080/cuda-programming-and-performance/-half-datatype-ieee-754-conformance/post/3831088/#3831088.

Note: While this code should still work, there may be better ways to deal with fp16 textures now that CUDA comes with comprehensive fp16 support. Check the CUDA 8 RC documentation.

[Later:] It seems the !#@$ software running this forum has removed contents between angle brackets from the code I had posted. Here are the bits my posted code originally had that went missing:

#include <stdlib.h>
#include <stdio.h>

texture<float, 2> tex;

Looks like I have a problem. I draw on OpenGL texture and map it to cudaArray_t.

GLuint texture = 0;
cudaGraphicsResource_t resource = nullptr;
cudaArray_t cudaArray = nullptr;

// ... generation, binding, rendering, etc.

cudaGraphicsGLRegisterImage(&resource, texture, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsSurfaceLoadStore);
cudaGraphicsMapResources(1, &resource);
cudaGraphicsSubResourceGetMappedArray(&cudaArray, resource, 0, 0);

cudaResourceDesc desc;
desc.resType = cudaResourceTypeArray;
desc.res.array.array = cudaArray;
cudaSurfaceObject_t surface;
cudaCreateSurfaceObject(&surface, &desc);

If I understood your code correctly I should create a cudaChannelFormatDesc with cudaCreateChannelDescHalf(). In order to use it I would have to use the linear struct from cudaResourceDesc.res union.

struct {
	void *devPtr;                      /**< Device pointer */
	struct cudaChannelFormatDesc desc; /**< Channel descriptor */
	size_t sizeInBytes;                /**< Size in bytes */
} linear;

instead of array

struct {
	cudaArray_t array;                 /**< CUDA array */
} array;

It forces me to use device pointer, which I can get with cudaGraphicsResourceGetMappedPointer(), but this function works only with buffers and not with textures.

Do I have to rewrite my OpenGL code to use buffers instead of textures or is there an other way?

I don’t know anything about CUDA/OpenGL interop. My assumption was that you can somehow import an OpenGL texture into CUDA, e.g. by retrieving a pointer to the underlying memory and then binding that to a texture on the CUDA side.