tex2D in OptiX7

Hi !

I did not find anything about “tex2D” in the OptiX7 API Reference nor in the Programming Guide.
So obivously tex2D is used from CUDA
But also at https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html I did not find the answer to my question:

When I use tex2D in an OptiX7-kernel:

tex2D<<b>float4</b>>( id, UV.x , UV.y ).w;

for a 2D image texture created with:

channel_desc = cudaCreateChannelDesc<<b>float4</b>>();  // < is it mandatory that these types must match 
                                                 //   with the type specifed at tex2D() ?
cudaMallocArray(&cuda_array,	&channel_desc,  width,  height ));
cudaTextureDesc tex_desc = {};
if float type
    tex_desc.readMode = cudaReadModeElementType;
    tex_desc.readMode = cudaReadModeNormalizedFloat;
cudaCreateTextureObject(&cuda_tex, &res_desc, &tex_desc, nullptr)  // init cudaTextureObject_t object 

obviously it is not always required for an OptiX7 kernel to initialize the cudaTextureObject_t object with the type used at tex2D<TYPE>

So its also possible to use a smaller texture, right?
For example 8bit (as most image files use) for init of the cudaTextureObject_t object:

channel_desc = cudaCreateChannelDesc<<b>uchar4</b>>();

and then access it the same way as if its a float4

tex2D<float4>( id, UV.x , UV.y ).w;

Due to memory saving optimizations in my app I tried out to use it that other way (and now it works), and also saw this handling in Ingo Walds example code.
Before I always used float4 (although the source images are mostly 8bit per component).

Is there a speed advantage when using float4 over uchar4 in cudaTextureObject_t ?

Thank you!

My System: OptiX 7.0.0 SDK CUDA 10.1.243 GTX 1050 2GB Win10PRO 64bit (version 1809; build 17763.107) device driver: 442.50 VS2019 v16.4.5 (toolkit v140 of VS2015) MDL SDK 2019.2

This is all native CUDA. OptiX 7 doesn’t even know about textures at all.
Read these chapters below (ignore the texture references):

You define how the texture is read via the cudaTextureReadMode in the CUDA Runtime API
resp. the CUDA_TEXTURE_DESC flags in the CUDA Driver API
before creating the texture object.

When reading as normalized floats, the integer data will be interpreted as fixed point value between 0.0 and 1.0. (Same as in OpenGL for example.)

The Texture::create() function inside the above files is the start point to create any available combination of 1D, 2D, 3D, cube, layered, mipmapped texture object.
(Just 16-bit half formats are not implemented there, yet, because the DevIL 1.7.8 loader converts them to floats. DevIL 1.8.0 is built without support for EXR.)

The rtigo3 example there also contains methods to update the texture image.
Only 1-, 2-, and 4-component textures are supported by the hardware.
The image data is converted to CUDA compatible formats on the fly with the 49 provided convert() functions.
The host- and device- encoding bitfields control what conversion routine is applied before upload.
(It might make sense to do that directly after load, but many picture formats are only 3-channel so this saves host memory.)

Absolutely not. Quite the opposite. float4 needs four times the memory bandwidth and more space in the texture cache.

1 Like

Thank you Detlef, for the clarification!

Ok, that is, what I was not completely aware in this context.

Good to know. I thought the int-to-float conversion on each fetch may take longer. But its good, that its not that way!

But I did not find any information about, whether the integer-based tex2D() fetching can be loaded directly from sRGB space (as for example a DXGI_FORMAT_B8G8R8A8_UNORM_SRGB texture in DirextX11) without converting such a texture into linear space.

Converting input data in my Path Tracer is much simpler.
I have my own module for image input (many image formats; taking care of sRGB based on usage desire) using Windows Imageing Component (WIC).
The output there is always a BGRA image 8- or 16-bit uchar4 or float4 image.
For DDS I use the DDSTextureLoader to get float4 or float output only.
And HDR is providing always float4 to keep the HDR range alive.

In your code you do the encoding convertion on the host on each mipmap level.
For clarity to show the process this is great.

I implemented mipmap levels with a derivate of:
CUDA Samples\v10.1\2_Graphics\bindlessTexture\bindlessTexture.cpp
building them based on the already converted mipmap level 0 on the GPU. The host memory is only temporarily used until the
first mipmap level 0 is uploaded to the GPU. Mipmap-Level creation on the GPU. And after finishing only GPU data remains;
no host memory at all.
That should be much faster. Isn’t it?
Or do I miss something?

To access such a mip level I use tex2DLod(). Or is there another way?
I always use the CUDA Runtime API.

From the docs you mentioned (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#texture-fetching) I found:
"[…]G.2. Linear Filtering[…] is only available for floating-point textures[…], but in this example:
its used with a uchar4 texture.
Also in your code https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo3/src/Texture.cpp#L675 on default
a linear filter mode is selected for “UNSIGNED_INT8” array default format. And I also found no call of setFilterMode() anywhere in your samples.
So is automatically Point-Filtering used on non-floats regardless the filtermMode setting?
Or did I misunderstand the docs?

Thank you!

Textures can be sRGB, but that only works on unsigned char fixed point formats.
See comment here:

I am not generating mipmap levels in my examples. I just read what is inside the picture.

The CPU conversion routines are able to convert between any two supported formats and can change the component layout and add a “one” alpha channel.

Both of these things would of course be much faster on the GPU.

That code began around 2013. The main purpose was to have CUDA runtime and driver API code showing how to create texture objects for all texture targets with layers and mipmaps.

The tex2DLod() fetches from one level only. The tex2DGrad() directly below that is doing mipmap filtering. In a ray tracer that would require ray differentials to calculate the dx and dy arguments.

I let the filter mode default to linear filtering and no mipmaps.

If you need a different setting, you must set that before creating the texture object because these settings are immutable. Means if you want to toggle them you need to create a new texture object.
That’s why the helper functions setting these always check that the texture object is still null.
Only the image data can be updated on the fly as shown in the rtigo3 example Texture::update() functions.

The examples are only using three textures of which two (the jpg and png) are using unsigned char components which are working just fine with my defaults.
The spherical HDR 2D environment texture is one example where the device encoding is forced to RGBA32F and the wrap settings are changed, but that’s handled inside the Texture class.

1 Like

Thank you very much for these insights!