I’m now studying the basic operations on CUDA.
And I donot know how to choose the number of thread and block.
For example, suppose we input 4MB float data to be processed, give the threadperblock = 256, then does the blockpergrid must be 4M/256=16384?
If not, how to choose the number of block
Can anyone give me some good ideas on how to choose number of thread and block in an effective way? Now I’m just based on a trial.
Thanks a lot