Mapping to a GL_TEXTURE_RECTANGLE

Hi

I’m a newcoming to CUDA, have read a book and have jumped in the deep end:)

I’m struggling to read image data from a GL_TEXTURE_RECTANGLE texture, the info online is pretty mixed up due to SDK changes over the years, I think what I have done would work on a normal texture, does anyone have any ideas of what would need to changed?

The GL_TEXTURE_RECTANGLE is RGBA float.

My texture reference:

texture<float4, cudaTextureType2D, cudaReadModeElementType> texRef;

My simple kernel:

__global__ void kernel(float *dev_ouput, int width, int height)
{
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    int offset = x + y * blockDim.x * gridDim.x;

    if ( x > width || y > height ) return;

    float4 pixel = tex2D(texRef, x, y);

    dev_ouput[offset*4 + 0] = pixel.x;
    dev_ouput[offset*4 + 1] = pixel.y;
    dev_ouput[offset*4 + 2] = pixel.z;
    dev_ouput[offset*4 + 3] = pixel.w;
}

And my code that calls it:

cudaGraphicsResource *resource;
    cudaArray* array;

    cudaGraphicsGLRegisterImage(&resource, textureID, GL_TEXTURE_RECTANGLE, cudaGraphicsRegisterFlagsReadOnly);
    cudaGraphicsMapResources( 1, &resource, NULL );
    cudaGraphicsSubResourceGetMappedArray (&array, resource, 0, 0);

    cudaBindTextureToArray(texRef, array);

//create buffer on device to store output
    float *dev_ouput;
    cudaMalloc((void**)&dev_ouput, renderedWidth * renderedHeight * sizeof(float) * 4);

dim3    grids(renderedWidth/16,renderedHeight/16);
    dim3    threads(16,16);

    kernel<<<grids, threads>>>(dev_ouput, renderedWidth, renderedHeight);

    cudaMemcpy(dataptr, dev_ouput, renderedWidth * renderedHeight * sizeof(float) * 4, cudaMemcpyDeviceToHost);

cudaFree(dev_ouput);
    cudaUnbindTexture(texRef);
    cudaGraphicsUnmapResources( 1, &resource, NULL );
    cudaGraphicsUnregisterResource(resource);

The result is an empty buffer but no CUDA errors - if in the kernel I just write fixed values I get the results in the buffer I expect.

Any ideas anyone?

I’m not a CUDA expert, but the index check in line 7 is incorrect if width and height are the resolution of the image. It needs to bail on greater or equal the rendered size or you index out of bounds.
That line can also be moved before offset is calculated in line 5.

A texture rectangle is accessed with unnormalized coordinates [0, width)x [0, height) instead of [0.0f, 1.0f) x [0.0f, 1.0f). If that is not the default on your texRef all your integer accesses with x, y will wrap and access the lower left texel. If that is black you have the correct result.

Ah OK, that makes sense.

How would I go about that? It doesn’t seem to be wrapping as I am getting nothing but 0.0f, and the texture doesn’t contain that…

For textures created inside the CUDA runtime you would switch the cudaTextureDesc field normalizedCoords to 0 to get unnormalized coordinates.
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#texture-object-api

For texture references it’s one paragraph further down:
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#texture-reference-api
and there it’s the normalized field.

Not sure if any of that happens automatically when registering the texture rectangle target. I doubt it.

The examples in the CUDA Toolkit address standard textures with x / (float) width and y / (float) height. If that works in your code that would be the culprit.

So changing the kernel to:

__global__ void kernel(float *dev_ouput, int width, int height)
{
    int x = threadIdx.x + blockIdx.x * blockDim.x;
    int y = threadIdx.y + blockIdx.y * blockDim.y;
    int offset = x + y * blockDim.x * gridDim.x;

    if ( x >= width || y >= height ) return;

    float4 pixel = tex2D(texRef, x / (float)width, y / (float)height);

    dev_ouput[offset*4 + 0] = pixel.x;
    dev_ouput[offset*4 + 1] = pixel.y;
    dev_ouput[offset*4 + 2] = pixel.z;
    dev_ouput[offset*4 + 3] = pixel.w;
}

Sadly gives me the same result.

Oh I should say that I am using SDK 5.5, looking at your link the interop stuff has changed yet again, but I am stuck with 5.5.

I saw the textureReference.normalized value earlier in the docs, it is set to the default 0.

Still struggling with this…

I can’t google a single example of somebody mapping a GL_TEXTURE_RECTANGLE to Cuda - can anyone point me to an example of this actually working?

I have a feeling that though cudaGraphicsGLRegisterImage is meant to accept a GL_TEXTURE_RECTANGLE, there is an additional step I have to take.