When I call the kernel, I put the grid dimension and block dimension.
I think that block dimension can find using the occupancy calculator.
But how can I find grid dimension?
And I think if the thread is 512 and block number is below some limit L,
then I thought that kernel of block number is 1…L-1 has same bandwidth because it executes it concurrently.
But it was not by the experiement. do you know why? :(