When will we want to use L1?

I noticed that in GEMM, we will only use SMEM to load A/B, and store C. Seldom will we use L1/TEX. Is there any classic situation we want to use L1?

By the way, I noticed cutlass uses L1/TEX for GEMM on Hopper, but I do not understand how. Is this important? (But cublas does not)

cutlass

cublas

Perhaps in those cases shared memory is used. It is like manual L1.

1 Like

Oh…? So what do you mean?

Using shared memory would not necessarily appear as L1 hit, you have to look at the tables below the graph in Compute Nsight to see the amount of shared memory stores and loads.

If you handle the data caching manually with shared memory, no L1 cache is needed.

1 Like

Yeah, I know. But when will we want to use L1 but not SMEM? I guess, graph problem? Or anything else?

With shared memory you can better control, when to start and when to end storage of data. With shared each thread can access a different bank, so less coalescing requirements.

L1 is simpler to use. If you cannot predict, which data to reuse or it is complicated, e.g. because the timing of different warps needing access to the data is not predetermined or because your overall working set is larger than the shared memory size or your program speed is not bound by it, or your problem and algorithm do not need repeated accesses, like element-wise operations.

Many kernels could be written to either use L1 or shared memory (or use each for different kind of data).

You can also write kernels, which use neither, because they keep the data within warps or threads.

Warps can use shuffle instructions to exchange data (shuffle uses the data shuffling portion of shared memory, but not the memory itself), threads can hold data in registers. If you unroll loops, you can store indexed arrays in registers. For example I have done complex FFT calculations with array size of 64 complex numbers in registers (needs 128 registers) of a single thread. So you can calculate 32 FFTs per warp simultaneously. No need for shared memory.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.