I want to know how to determine the optimal thread allocation and thread block allocation
https://www.olcf.ornl.gov/cuda-training-series/