I have an application where L2 reuse can potentially speed up the execution. However, I see that the code runs faster when L2 hit rate is lower (I have a knob that can vary the L2 hit rate).
So I decided to run a micro benchmark experiment measuring the L2 and device memory bandwidth for my Tesla K40 GPU.
Here’s the small test :
#include <stdio.h>
#include <cuda_runtime.h>
#include <helper_cuda.h>
#define SIZE (1024*1024)
__global__ void withl2(int *a, int *b) {
int tid = threadIdx.x;
int tid2 = blockIdx.x*blockDim.x + threadIdx.x;
a[tid2] = b[tid];
}
__global__ void withoutl2(int *a, int *b) {
int tid = blockIdx.x*blockDim.x + threadIdx.x;
a[tid] = b[tid];
}
int main() {
int *a, *b;
checkCudaErrors(cudaMalloc(&a, SIZE*sizeof(int)));
checkCudaErrors(cudaMalloc(&b, SIZE*sizeof(int)));
int threads = 128;
int blocks = SIZE/128;
withl2<<<blocks,threads>>>(a, b);
checkCudaErrors(cudaDeviceSynchronize());
withoutl2<<<blocks,threads>>>(a, b);
checkCudaErrors(cudaDeviceSynchronize());
return 0;
}
In the kernel withl2, different threadblocks access the same sections of array b (0 to 127). Hence, for almost all of the blocks, all accesses to b should be found in the L2 cache.
In the kernel withoutl2, each threadblock accesses different sections of the array. Hence, there should be no L2 hit at all.
I ensured that the total number of accesses is huge by choosing a very high number of total threads. L1 cache is switched off by default, since all accesses are global The ECC option is disabled for the GPU, resulting in the maximum possible device memory bandwidth.
Here are the results:
Kernel : withl2 : withoutl2
L2 hit rate (reads) : 99.99% : 0%
Execution time (us) : 45 : 38
Bandwidth achieved (read) : 83 GBps (L2 throughput) : 99.66 GBps (Device memory throughput)
This is really surprising since I would expect the kernel withl2 to achieve better BW, since all accesses are being hit in the L2. Can someone please provide an explanation?
Also, is there a reference quoting the L2 bandwidth numbers for the K40 architecture?
Thanks in advance!