Is there any document which describes in more detail the memory model of the tesla C2050 ?
Especially with the L1 and more importantly the L2 cache, it be great if more detail is available…specifically i had the following questions
The NVIDIA profiler gives numbers for only the L1 cache hit/miss …any tool to get these numbers for the L2 cache ?
The L2 cache is across all the blocks, and what is its impact specifically for data being load repeatedly within a block
How does this cache model, relax the coalescing requirement ? (it’s mentioned that coalescing requiements are not as stringent, but what would be the preferred access patter, if any ?)
Is there any tool available for providing info on data access pattern of the GPU ?
thanks, this is on the the very few presentations with more detail info on the fermi cards.
I am not referring the traditional coalescing model. The programming guide has the description of the coalescing model and it goes onto say that the with Fermi the coalescing requirement are far less stringent. I was curios about the more specific implications of L1/L2 cache on the memory access patterns ?? Any info on that would be helpfull…
thanks, this is on the the very few presentations with more detail info on the fermi cards.
I am not referring the traditional coalescing model. The programming guide has the description of the coalescing model and it goes onto say that the with Fermi the coalescing requirement are far less stringent. I was curios about the more specific implications of L1/L2 cache on the memory access patterns ?? Any info on that would be helpfull…
In Chapter G.4.2 they are quite specific about how L1/L2 caches are involved:
“Each memory request is then broken down into cache line requests that are issued independently. A cache line request is serviced at the throughput of L1 or L2 cache in case of a cache hit, or at the throughput of device memory, otherwise.”
In Chapter G.4.2 they are quite specific about how L1/L2 caches are involved:
“Each memory request is then broken down into cache line requests that are issued independently. A cache line request is serviced at the throughput of L1 or L2 cache in case of a cache hit, or at the throughput of device memory, otherwise.”