I have noticed two different ways of allocating shared memory. In the N body simulation shared memory is declared as extern, and I cannot find where the extern shared variables are declared elsewhere besides the function they are used in, while in the prog guide in the example for matrix multiplication the extern is not stated.
When I try to compile with extern shared on C1060 I get compile errors, but when I omit the extern it compiles and works.
What is the difference between the two, if any, and what happens when I declare a shared memory variable shX of size NUMTHREADS? Is just one array shX allocated in the shared memory for all threads in one block, with NBLOCKS arrays allocated for NBLOCKS of threads? Or is just one array allocated for all blocks and its contents switched?