I am trying to find number of banks in global memory of Nvidia Tesla K40 GPU. It is a Kepler based GPU with 12GB of DRAM with 384 bit interface.
In the official technical white paper of Fermi, they clearly state that there are 6 memory partitions, each with 64 bit interface(overall 384 bit interface). But in Kepler’s case, there is no indication of number of banks/partitions in any official documentation.
From a microbenchmark similar to THIS microbenchmark, I am getting maximum throughput when data size is in multiple of 128 Bytes.
Is it the result of coalesced access of 128 Byte or is it related to bank width?
In a paper for older architectures, authors have used partition width of 256 Bytes to decide how many elements in a row are assigned on one partition.
These GPUs had 128/256 bit interface and 8 banks in global memory. What is the correlation between interface width and bank width? Does the spikes in my graph mean, bank width is 128 Byte?