I am reading about CUDA and how the blocks and grids are organized. I am reading the book “Programming Parallel Processors A Hands-on Approach”. In the book there is the following example
vecAddKernel<<<dimGrid, dimBlock>>>(. . .);
In the book it is referred that the above code generates 1D grid that contains 128 blocks. Each of them has 32 threads . Is this correct? From my point of view , it creates 1D grid that contains 32 blocks , each of them has 128 threads.
What is the correct statement for the above code?