4 questions about Fermi GPU

4 questions about Fermi GPU

1 In each SM, where does the “context of blocks” store?In register or shared or caches ? If in caches, which caches ?

2 what is the MaxBlocksPerSM for Fermi? Is it 8?

3 for all kinds of caches (global cache, local cache, L1 cache, L2 cache, constant cache, uniform cache, texture cache, instruction cache), Where are their exact positions in chip? (I only find instruction cache and L2Cache on publications)
L1 cache = uniform cache + instruction cache ? L2 cache = textcache + constant cache ?
their relationships? Their capacity? Their functions?

4 where can I find the table for Fermi instructions’ IPC (instruction per clock) ?

I think the CUDA Programming guide will give you all your answers, particulary Chapter4 “Hardware Implémentation” and Appendix F “Compute Capability”.