I have some questions on multiprocessor architecture when I reading the ‘cuda_c_programming_guide.pdf’. Here is the part of the compute capability 6.x:
So my questions are:
- Where is the read-only constant cache? I can’t find it in the GP104 SM diagram(see below).
- What is the size of this read-only constant for each multiprocessor? Is it configurable?
- Does the ‘L1/texture cache for reads from global memory’ mean directly from global memory to L1/texture cache, or from global memory to L2 cache and then from L2 cache to L1/texture cache? How is the effeciency comparision?
- For Kepler, we are using the fixed-size L1 cache to cache accesses to local memory including register spills; however, for Maxwell&Pascal, we are using the shared-by-all-multiprocessors L2 cache, so how is the number of the block assigned to one multiprocessor determined?