I’m new to CUDA. I’m writing 3D-filters, using 3D textures.
I’d like to know if there is a specific function to go through all the neighboring texels of a single texel.
for instance: for a minimum filter, for each texel i need to go through all the texel within a surrouding sphere and find the minimum value.
I’m currently passing a pre-computed array with all the coordinates of the sphere to the kernel as an argument, and the kernel go through it. But i find it rather slow.
Are there more optimized solutions?
FYI, DirectX has a texture fetch function that returns the 4 neighbouring samples that would have been used for bilinear filtering:
Unfortunately this is only for 2D lookups and isn’t supported in CUDA yet.
How big is the kernel? You could try loading the neighbourhood into shared memory.
Thanks for your answer. By kernel, do you mean kernel of the filter? if yes, it’s a parameter, i use radius from 5 to 15 pixels which mean ~2000 to 57000 positions to visit by a single thread, which make the run very slow (20* speed up compared to CPU with a quadro600).
If i understand, it could be faster to load the pixels (+surroundings) processed by a block in the shared memory? how can you do that?