Hello, I have a question about architecture of GPU.
I want to ask you if there can be any interference between two different processes on bandwidth between L2 cache and L1 cache. I’m not sure if L1 cache shares same bandwidth with each other since they are in seperate SMs.
I’m also curious if bandwidth between l2 cache and l1 cache is same with the bandwidth between l2 cache and device memory.
Thank you in advance!
The L1 caches from different SMs are independent from each other. Only if several have to wait for L2 can they can have an effect on each other.
L2 to L1 bandwidth is about 2x-4x (depending on GPU model) faster than L2 to global device memory. (With “to” meaning read and write and not indicating a direction.) There are no (neatly published) official numbers, but you can look at 3rd party benchmarks or the Dissecting … Architecture papers. Or look at Compute Nsight, how much bandwidth of the caches was occupied.
It may be that within each SM L1 shares bandwidth with shared memory, as they are based on the same silicon.
1 Like
thank you for answering.
I didn’t know L1 cache only approach to L2 cache one by one. or is it that there is a certain circumstance that l1 cache only can approach to l2 one by one and other than that circumstance, can they approach to l2 cache all together? I think it might be hard to fully utilize bandwidth if L1 cache approach to L2 cache one by one all the time.