Impact of Grid and Block Dimension on performance

Does dimensions of a grid and/or block impact performance of an application ,
e.g Let say we have a vector of N elements and we wan to search vector element “a” at index i in Row i
of a N X N Matrix. Assuming we want to do it by using N*N threads

We can do it by 2D grid of 1D blocks: grid Dim(N/BLOCK_WIDTH,N,1) and block Dim(BLOCK_WIDTH,1,1).
We can do it by using 2D grid of 2D blocks: grid Dim(N/BLOCK_WIDTH,N/BLOCK_HEIGHT,1) and block Dim(BLOCK_WIDTH,BLOCK_HEIGHT,1).

Is there any performance gap b/w two approaches?

if the block dimension does not imply a change in the number of threads per block,

from a hardware perspective, i would think that a) kernels are mostly approached at a block level, b) different blocks of a kernel are generally identical, except for differing dimension indices

from a code/ instruction perspective, the question whether the additional dimensions are relevant, or redundant
i can think of cases falling under both categories

in any case, you should be able to empirically test it