I know that a block can run 512 threads, and you can have 65,535 blocks. Also i know that threads within a block can communicate with each other which is very useful, but my main point is if you have 65535 blocks running with 1 thread each on a gpu with 300 cuda cores will 300 blocks run at once? ie 300 simultatious actions? or do you have to have 1 block with 512 threads to run to take advantage of the 300 cores? I know theres something about a warp and 32 threads running at once… Could someone sum it all up to me? Would 10 blocks running each with 32 threads be optimal?