I have found that I have sometimes spent a lot of time trying to get the best possible solution to something when the simple approach would have performed well enough and could have been written in 1/10th the time. So I would try a simple approach and if that doesn’t perform well enough then try to improve the memory access patterns.
I completely agree with kbam. In higher dimensions the number of neighbors quickly grows larger than what fits into one cacheline, resulting in no benefit from spatial locality at all.