Optimal partitioning of cuda threads

I want to know how to determine the optimal thread allocation and thread block allocation

https://www.olcf.ornl.gov/cuda-training-series/