Example on how to use tex2Dlod() from plain image data in regular CUDA memory?

I am doing a custom image warp task in CUDA, and I have until now built mipmap and trilinear sampling in plain CUDA. This is sub-optimal, caching wise, as the memory has no knowledge of the 2D spatiality of my image data. I also assume I am burning generic cycles doing texture interpolation, that can be handled by special hardware.

I would like to optimize this step and would therefore like to use tex2Dlod() and tex2Dgrad() functions from inside my regular kernel function.

I have not been able to find an example to work from. I need something demonstrating how to transform my regular cuadMalloc()'ed array into the structures needed by the beforementioned tex2D* flavors.

Does there exist an example or article I can get some inspiration from available?

Kind regards

Jesper

The CUDA sample codes bindlessTexture and vulkanImageCUDA both demonstrate the usage of tex2DLod.

I’m definitely not an expert on this, but based on the programming guide as well as the comment in the bindlessTexture sample code, the usage of tex2DGrad appears to be similar except that the LOD is inferred from the supplied gradients instead of explicitly from the supplied level. I imagine the underlying setup of the data is identical.

By the way, I’m fairly certain this expects an underlying CUDA array, not “plain image data in regular CUDA memory” and not a “regular cudaMalloc()'ed array”