best way to work with n-dimension thread What's the best way to work with n-dimension thread?


I would like to work on n dimensional data with cuda implementting a simple laplacian mask.

As the texture cache is 3dimension max, I use 1 dimensional arranged data and shared memory to store neighboors values.

However, quoting this thread

It’s obvious Ndimensional data cannot be arranged in a manner neigboors of a thread are in nearby refions in memory.

So What’s the best option to do that?

I have found that I have sometimes spent a lot of time trying to get the best possible solution to something when the simple approach would have performed well enough and could have been written in 1/10th the time. So I would try a simple approach and if that doesn’t perform well enough then try to improve the memory access patterns.

NB the link seems to be the wrong one.

I completely agree with kbam. In higher dimensions the number of neighbors quickly grows larger than what fits into one cacheline, resulting in no benefit from spatial locality at all.