Does dimensions of a grid and/or block impact performance of an application ,
e.g Let say we have a vector of N elements and we wan to search vector element “a” at index i in Row i
of a N X N Matrix. Assuming we want to do it by using N*N threads
We can do it by 2D grid of 1D blocks: grid Dim(N/BLOCK_WIDTH,N,1) and block Dim(BLOCK_WIDTH,1,1).
We can do it by using 2D grid of 2D blocks: grid Dim(N/BLOCK_WIDTH,N/BLOCK_HEIGHT,1) and block Dim(BLOCK_WIDTH,BLOCK_HEIGHT,1).
Is there any performance gap b/w two approaches?