So I’m trying to figure out if I have the threading model correct. The statement below is how I currently understand CUDA. If any statement is wrong, could you please tell me where I’ve made a mistake? Thanks.
Each card has several multiprocessors. Each multiprocessor has 8 processors. Each processor can execute 768 threads at once. The 8 processors have shared memory they can access.
So, for a geforce 8600 with 4 multiprocessors, I can have a maximum of
48768 = 24576 threads executing nearly concurrently over 32 blocks, and the remaining threads will be scheduled to be processed after these threads complete.
OK, where exactly do the grids the SDK docs talk about come into play and are the rest of my statements correct? Is a grid just a bunch of blocks that the kernel implements? Thanks for the help.