Grid size & performance


I’m currently working on my first project with CUDA.
What confuses me are the grid and block sizes. Do they have any impact on the performance?

More specific:
Let’s assume I have a problem that consists of doing a calculation for lots of points in a 3D space. The calculations for every point are independent, so I don’t need any shared memory.
Of course I could create a 3D grid with 3D blocks underneath and split up the problem according to their distribution in the 3D space.
But, for simplicity, I just created a 1D grid with 1D blocks and my kernels process the points sequentially. The GPU is at 100% utilization, no core seems to be idle. There should be no performance benefit if I switch to a 3D grid/block, right?

But then let’s go a bit further and assume points in my 3D space that are close to each other need to access similar global memory locations. IMO it would then make sense to split up the problem in a way that a block always contains points that are close to each other because then they are processed on the same core, which means that they share their L1 cache.

Is my understanding correct that multidimensional grids/blocks are just a “nice-to-have” feature to simplify indexing inside the kernels because theoretically, every problem could be broken down to a simple 1D array of problems?

It’s exactly as you describe.