I’m looking to understand how the NVIDIA GPU schedules the threads and I read a SP can load 4 threads at once. That should explain why wraps for a GPU possessing SM with 8 CUDA cores posseses a size of 32 (4*8). Is the size of wraps a function of the ration SP/SM ? If the SM have 16 SP, does the wraps have a size of 64 ?
Thanks for your help,