CUDA kernel for ND inputs

Hello everyone,
I am trying to write a CUDA kernel that needs to take ND inputs and apply mad operation (multiply and add) for each value in input

For 2D case i was considering configuration as below
Block - (16, 16, 1)
Grid - (ceil(Width / 16, ceil(height / 16), 1)

But for ND case, i am little unclear on how the grid and block configuration should be since number of dimensions is known only during run time

Would request the community to share their thoughts on this

You can map any n-dimensional index to a 1d index. This means you could use a simple 1d block for arbitrary input dimensions.

Got it
Thanks for the suggestion