I am doing a custom image warp task in CUDA, and I have until now built mipmap and trilinear sampling in plain CUDA. This is sub-optimal, caching wise, as the memory has no knowledge of the 2D spatiality of my image data. I also assume I am burning generic cycles doing texture interpolation, that can be handled by special hardware.
I would like to optimize this step and would therefore like to use tex2Dlod() and tex2Dgrad() functions from inside my regular kernel function.
I have not been able to find an example to work from. I need something demonstrating how to transform my regular cuadMalloc()'ed array into the structures needed by the beforementioned tex2D* flavors.
Does there exist an example or article I can get some inspiration from available?
Kind regards
Jesper