Problems using 2D CUDA textures in Optix 7.3

I’m not sure if this is a CUDA problem or an Optix problem. I am just starting out with CUDA and Optix.

I started with the Optix 7.3 optixTriangle sample and have written my code based on that example. At this point I am trying to draw a cube and apply a 2D texture to it. The basics of drawing the cube, as well as rotating it, scaling it or moving it by applying host side transformations seem to be working. Applying the texture to the cube is not working. What I end up with a cube it looks like I’ve applied a uniform color to all the faces. If I print the color value in my closest hit program, the values for the red, green and blue channels very by very small amounts, not enough to really affect the color.

The cube length for each dimension is 1.0, and the cube is centered at 0.0, so coordinates are -0,5:0.5:0 0.5:0.5:0, etc and the cube is drawn centered in the window as I expect with no transformations, so I see just the front face of the cube.

The cube and the texture are exported from Blender as a wavefront obj file. The texture image file is 32 bits per pixel, 8 bits each for R, G, B, and alpha. When building the texture object I convert each byte to float and store that in my texture array.

The code to build the texture object and copy it to the GPU is


bool Texture::load(void) {
    QImage image(imagePath);
    if (image.isNull()) {
        fprintf(stderr, "Unable to load texture %s\n", imagePath.toLatin1().data());
        return false;
    }
    int imageWidth = image.width();
    int imageHeight = image.height();
    float *hostImageArray;
    hostImageArray = new float[imageWidth * imageHeight * 4];
    cudaChannelFormatDesc channelDescriptor = cudaCreateChannelDesc(32, 32, 32, 32, cudaChannelFormatKindFloat);
    cudaError_t rc = cudaMallocArray(&gpuImageArray, &channelDescriptor, image.width(), image.height());
    if (rc != cudaSuccess) {
        fprintf(stderr, "Unable to allocate GPU array for %s: %s\n", imagePath.toLatin1().data(), cudaGetErrorString(rc));
        return false;
    }
    float *imageData = copyImage(image);
    if (imageData == nullptr) {
        return false;
    }
    rc = cudaMemcpy2DToArray(gpuImageArray, 0, 0, imageData, imageWidth * sizeof(float) * 4, imageWidth * sizeof(float) * 4, imageHeight, cudaMemcpyHostToDevice);
    delete[] imageData;
    if (rc != cudaSuccess) {
        fprintf(stderr, "Error copying texture to GPU: %s\n", cudaGetErrorString(rc));
        cudaFreeArray(gpuImageArray);
        gpuImageArray = nullptr;
        return false;
    }
    cudaResourceDesc resourceDescriptor = { };
    resourceDescriptor.resType = cudaResourceTypeArray;
    resourceDescriptor.res.array.array = gpuImageArray;
    cudaTextureDesc textureDescriptor = { };
    textureDescriptor.addressMode[0] = cudaAddressModeClamp;
    textureDescriptor.addressMode[1] = cudaAddressModeClamp;
    textureDescriptor.borderColor[0] = 0.0f;
    textureDescriptor.borderColor[1] = 0.0f;
    textureDescriptor.disableTrilinearOptimization = 1;
    textureDescriptor.filterMode = cudaFilterModeLinear;
    textureDescriptor.maxAnisotropy = 1;
    textureDescriptor.minMipmapLevelClamp = 1.0f;
    textureDescriptor.maxMipmapLevelClamp = 99.0f;
    textureDescriptor.mipmapFilterMode = cudaFilterModePoint;
    textureDescriptor.mipmapLevelBias = 0.0f;
    textureDescriptor.normalizedCoords = false;
    textureDescriptor.readMode = cudaReadModeElementType;
    textureDescriptor.sRGB = 0;
    rc = cudaCreateTextureObject(&texture, &resourceDescriptor, &textureDescriptor, nullptr);
    if (rc != cudaSuccess) {
        fprintf(stderr, "Unable to create texture object: %s\n", cudaGetErrorString(rc));
        cudaFreeArray(gpuImageArray);
        gpuImageArray = nullptr;
        return false;
    }
    return true;
}

My closest hit program is


static __forceinline__ __device__ void setPayload(float3 p) {
    optixSetPayload_0(float_as_int(p.x));
    optixSetPayload_1(float_as_int(p.y));
    optixSetPayload_2(float_as_int(p.z));
}

extern "C" __global__ void __closesthit__ch() {
    // Adapted from /home/dave/src/NVIDIA-OptiX-SDK-7.3.0-linux64-x86_64/SDK/cuda/LocalGeometry.h and /home/dave/src/NVIDIA-OptiX-SDK-7.3.0-linux64-x86_64/SDK/cuda/whitted.cu
    // Get access to hit data structure used by this shader
    const HitGroupData *hitData = reinterpret_cast<HitGroupData*>(optixGetSbtDataPointer());
    const int *vertexIndices = reinterpret_cast<int*>(hitData->modelData->vertexIndices);
    const float3 *vertices = reinterpret_cast<float3*>(hitData->modelData->vertices);
//    const float3 *normals = reinterpret_cast<float3*>(hitData->modelData->normals);
    const float2 *texture = reinterpret_cast<float2*>(hitData->modelData->texture);

    // Get the barycentric coordinates for this ray/triangle intersection
    const float2 barycentrics = optixGetTriangleBarycentrics();

    // Get the index into the vertex index, normal, and texture arrays for the triangle that was hit. optixGetPrimitiveIndex returns the index of the triangle as defined when building the
    // acceleration data structures. The vertex index, normal and texture arrays contain one entry for each triangle vertex, so multiply the triangle index by 3.
    // These arrays must be built in sync with the data in the acceleration structures, such that the same triangle index identifies the same triangle in both.
    const unsigned int triangle = optixGetPrimitiveIndex() * 3;

    // Get the indices for the triangle that was hit
    const int triIndex1 = vertexIndices[triangle];
    const int triIndex2 = vertexIndices[triangle + 1];
    const int triIndex3 = vertexIndices[triangle + 2];

    // Now get the triangle vertices
    const float3 triVertex1 = vertices[triIndex1];
    const float3 triVertex2 = vertices[triIndex2];
    const float3 triVertex3 = vertices[triIndex3];

    // Get texture coordinates for the triangle that was hit
    const float2 textureV1 = texture[triangle];
    const float2 textureV2 = texture[triangle + 1];
    const float2 textureV3 = texture[triangle + 2];
    const float2 geometryUV = (1.0f - barycentrics.x - barycentrics.y) * textureV1 + barycentrics.x * textureV2 + barycentrics.y * textureV3;

    const float3 color = make_float3(tex2D<float4>(hitData->textureObject, geometryUV.x, geometryUV.y));
    setPayload(color); //make_float3(barycentrics, hitData->blueValue));

I use an array of vertices and a vertex index array when building my acceleration structures.

I’m not sure I need these or not, but for the moment I am copying the vertex array, the vertex index array, and the texture mapping array into a block of GPU memory and setting a pointer to those arrays in the parameter block for my closest hit program. There is a 1:1 correspondence between the vertex array and the texture array, as in the original .obj file.

One thing I’m not sure of is whether for it to work at all, or for performance reasons, I need to convert my texture image data from an array of unsigned char to an array of float. It does increase memory requirements by 4 times.

I think I’m doing something wrong with my tex2D call, but I have no idea what. Explanation of what I’m doing wrong are appreciated.

I’m running Fedora 33 Linux, driver 470.74, CUDA 11.4.100 and Optix 7.3

Well, I found at least part of my problem. If I set textureDescriptor.normalizedCoords to true when building the cudaTextureDesc then I get cube faces that look reasonable. If I set it to false, as in my original post, then the cube face coloring is obviously wrong.

I’ve been staring at this for days, and it only became clear after posting.

I still am wondering if my texture has to be a float per color channel for each pixel, or if I can reduce memory requirements by using unsigned char for each channel as in the original data.

OptiX 7 doesn’t have any API related to textures. Means all texture object handling happens in CUDA native host code and the texture fetch functions in the device code.
So if there is anything not working, it’s related to your CUDA code.

Note that normalizedCoords is an int. I wouldn’t assign a boolean to that for code style reasons.

There is a 1:1 correspondence between the vertex array and the texture array, as in the original .obj file.

Mind that the OBJ file format itself uses different pools for vertices, normals and texture coordinates. That’s why faces are defined with three indices per vertex inside *.obj files.
That is not how vertex arrays in graphics APIs are defined. Those use one index per vertex and have same sized arrays for vertices, normals and texture coordinates.
Means the loader probably reformatted the data to have that 1:1 relationship.

I still am wondering if my texture has to be a float per color channel for each pixel, or if I can reduce memory requirements by using unsigned char for each channel as in the original data.

You can use all texture formats supported by CUDA which includes RGBA8 as well.
Means there is no need to convert the data to float.
You control how the data is read with the texture descriptor readMode field which should be set to cudaReadModeNormalizedFloat then.
You will get normalized float values in the range (0.0, 1.0] when using the respective texture fetch function overload, like tex2D<float4>(textureObject, u, v) on such RGBA8 texture.

Mind that CUDA only supports 1-, 2-, and 4-component textures, there is no 3-component texture support in the hardware, means RGB8 textures like JPG will need to be converted to RGBA8 by adding the appropriate alpha channel.

Maybe have a look at my OptiX 7 examples which show how to convert and upload different input formats for all texture targets (1D, 2D, 3D, cubemap) with or without mipmaps and with or without layers.
There are identical implementations using the CUDA runtime and CUDA driver API here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Texture.cpp
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_driver/src/Texture.cpp

The more advanced examples also contain code to update the texture image of already created CUDA texture objects, but those are only using the CUDA driver API because that allows more precise control over the CUDA contexts in multi-GPU configurations.

1 Like

You answered my questions. Thank you.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.