Hello everyone,
I am trying to write a CUDA kernel that needs to take ND inputs and apply mad operation (multiply and add) for each value in input
For 2D case i was considering configuration as below
Block - (16, 16, 1)
Grid - (ceil(Width / 16, ceil(height / 16), 1)
But for ND case, i am little unclear on how the grid and block configuration should be since number of dimensions is known only during run time
Would request the community to share their thoughts on this