I am a little new to CUDA. And I have some doubts whose answers I am not able to find. Please help me answer these questions:
How does the number of simultaneous blocks running per SM effect the performance: As far as i know finally at hardware level threads are executed in terms of warps. So at a time only threads of one block can run on an SM no matter how many resources you have to accommodate more blocks, hence having more blocks running simultaneously doesn’t seem to increase performance. For instance if i have a kernel which can be executed in two ways: one way in which each block uses say 14k for shared memory so in this case each SM can have only one block and other way can be if I use only 5k of shared memory so now it can support two blocks. Which way is better as far as performance
How much is the maximum size of texture memory that we can use for general purpose?