best way to work with n-dimension thread What's the best way to work with n-dimension thread?

Hello,

I would like to work on n dimensional data with cuda implementting a simple laplacian mask.

As the texture cache is 3dimension max, I use 1 dimensional arranged data and shared memory to store neighboors values.

However, quoting this thread http://forums.nvidia.com/index.php?showtopic=192103

It’s obvious Ndimensional data cannot be arranged in a manner neigboors of a thread are in nearby refions in memory.

So What’s the best option to do that?

I have found that I have sometimes spent a lot of time trying to get the best possible solution to something when the simple approach would have performed well enough and could have been written in 1/10th the time. So I would try a simple approach and if that doesn’t perform well enough then try to improve the memory access patterns.

NB the link seems to be the wrong one.

I completely agree with kbam. In higher dimensions the number of neighbors quickly grows larger than what fits into one cacheline, resulting in no benefit from spatial locality at all.