I am a newbie using CUDA. Can some one tell me how to set the block size? From some docs, I am implied that the block size was arbitrarily set by programmer. Is it true if I set any block size up to 768 threads? Here, 768 is the threads boundary for each of SM.
What is the tradeoff between small and large block sizes?
Thanks a lot!