How to create cudaTextureObject with texture raw data in Block Compressed (BC) format?

Hi,
Now I’ve got a block of 2D texture raw data loaded from file (which has already been compiled and encoded with BC3/S3TC format). The CUDA runtime API cudaCreateTextureObject() require 3 descriptive struct, and the cudaResourceViewDesc seems to support BC3 format.
My questions are:

  • What should be the width/height/channelDesc when I cudaMallocArray()? What should be noted to fill those desc struct?
  • How to sample this 2D texture object of compressed format in CUDA program? Is Tex2D() ok?

BC3 format encodes 4x4 pixel block with 16 bytes, so the concept ‘pixel’ became useless (seems to me). Now I am allocating BlockWidth * BlockHeight with 4 uint32 channels for my resource and the texture can’t be sampled properly.

Environment:
Optix SDK 7.0, Win10, CUDA 10.2, GTX 1060

Thanks!

Hey this is a good question, and I don’t know the answer. Someone else on the team might be able to help tomorrow, but I thought you might be able to find an answer more quickly if you cross-posted this to the CUDA forum, since it’s really more of a CUDA API question than OptiX. https://forums.developer.nvidia.com/c/accelerated-computing/cuda/cuda-programming-and-performance/7

Hopefully someone here can help, but please do post here with the answer if you find out.


David.

Thanks David for your reply!
After a whole day’s suffering trial and errors, I managed to find out the answer (probably):

For example, if we are using a 256x256 texture with BC3 format, which encodes 4x4 pixels into one block with 16 bytes, we can know that this texture has 64x64 compressed blocks.

  • According to some hints from the document of cudaCreateTextureObject(), the ‘underlying resource’ channel desc should be uint4 / R32G32B32A32_UNSIGNED_INT for BC3 format, width =64, height = 64. Then, the resourceViewDesc should be filled with width = 256, height = 256, format=BC3 to re-interpret the ‘underlying resource’ in another way.
  • tex2D< float4 >() can be used in CUDA kernel to sample the texture.
  • texture filter mode can’t be set to ‘Linear’ because the resource use UNSIGNED_INT channels.

anyway, it seems to work finally.

Be careful about block compressed mipmaps though. The CUDA implementation of block compressed textures is not handling any texture levels with extents which are not a multiple of 4.
That’s due to the order in which the mipmaps are allocated and then re-interpreted with a resource view as a block compressed format. Since the mipmaps are not allocated with the necessary rounding (up) to 4x4 blocks but using the actual rounded (down) texels, the texture level extents are not matching the required block compressed extends for those.

Means non-mipmapped textures with extents being a multiple of 4 will work. Other textures would need to be scaled.
Mipmaps would need to be limited to the smallest level which still has multiple of 4 extents both when downloading and sampling (set the texture description max mipmap level clamp.)

That block compressed resource view method is insufficient and requires a different API with native block compressed formats instead to determine the correct mipmap sizes in blocks. This is a long standing CUDA bug I filed after analyzing this problem:
https://forums.developer.nvidia.com/t/mip-map-with-block-compressed-texture/72170

2 Likes

Thanks detlef! Our project compiles textures to make sure its width/height are power of 2. And I abandoned last 2 mip levels when I create the CUDA texture object. This is a compromise I worked out for BC mipmap problem.

Right, but that will only work for square non-power of two textures.
Rectangular textures can have more than two mipmap levels with extents smaller than 4.