16 threads per block would be a very bad design choice, and is certainly no better than 32 threads per block. In order to get maximum performance on many GPUs, a minimum number of 64 threads per block is needed to get maximum occupancy. This topic has been covered many times in many forum posts, so I’m not going to repeat it here. If you want an orderly introduction to CUDA, I suggest at least the first 4 units here. The answer to “why would 16 threads per block be a bad choice?” is covered there as well as many other places.
Thank you!!! But the link…seems to be a previous conference and have no recording now?
Everything is still there.
For example to get to the first unit. Click the link I provided. Then scroll down and click on “1. Introduction to …”. Then scroll down and clock on the presentation tab. From there you will see a link for the slides and also a link for the recording of the session. Where you clicked on the presentation tab, you can also click on the exercises tab, to see where the code exercises are.
oh, yes!! I find them!!! I will learn them by heart! Thank you!!!
I am indulged by them now! They are really useful!!!