Overview of K80 Architecture

Hi all,

I haven’t understood k80 architecture. As i see, there are two gpu (GK210) and two memory area 2x12GB.
My questions are

  1. When I have kernel which is able to occupy entire k80, is it going to work on two gpu?
  2. Who is managing these two memory area? (It looks like kind of numa architecture?
  3. If hardware or cuda manages everything, what if some part of my kernel wants to access data from both memory area ?
  4. If I launch dynamic parallelism, is hardware or cuda going to take account data locality?(maybe this is no :) )
  5. Shared memory is bigger. Does it reside distributed on two gpu?
  6. What's difference between using 2xK40 and 1 K80 ?

Thanks in advance

When I have kernel which is able to occupy entire k80, is it going to work on two gpu?

a single kernel will not occupy both GK210 devices. The K80 GPU appears to the system much like 2 K40 devices would. To make full utilization of the K80, you must have a multi-GPU aware application.

Who is managing these two memory area? (It looks like kind of numa architecture?

Each GK210 device is independent and independently manages it’s own 12GB memory.

If hardware or cuda manages everything, what if some part of my kernel wants to access data from
both memory area ?

Memory from one GK210 is visible to the other if a mechanism like P2P is used, just as it would be if two K40 devices were used.

If I launch dynamic parallelism, is hardware or cuda going to take account data locality?(maybe this is no :) )

CUDA dynamic parallelism runs on a single GPU device, so in the case of K80 that would be a single GK210 device. It does not automatically use the resources of the other GK210 device.

Shared memory is bigger. Does it reside distributed on two gpu?

Each GK210 device (compute capability 3.7) has an expanded shared memory system. The shared memory of one GK210 is not usable by the other.

What’s difference between using 2xK40 and 1 K80 ?

Conceptually, the usage is quite similar from a programmer’s perspective.

Thank you for answer txbob, they are very useful. In short, K80 will be managed same way of multi gpu.
However, is there a any performance changing/improvement for peer-2-peer data transfer? Will it use PCI ?

Peer to Peer still uses the PCIE connection between the 2 GK210 devices. It happens to be a PCIE Gen3 link.

thank you very much for these useful infromation txbob.

txbob,

Thank you for your answers. Its very informative. I have a couple of additional queries

  1. I assume that the k40s inside the k80, are directly connected through PCIE 3 and do not have to pass data through host. Am I correct ?

  2. Is the connection between the 2 Titans inside the Titan Z very similar to this. Is there any fundamental difference in the connection between the 2 GPUs, inside the 2 devices(K80 and Titan Z) ?

Thank you in advance.

Yes, on both Titan Z and K80, there is a PCIE link (PLX PCIE switch) that directly connects the two GPU devices. It is possible for those two GPU devices to exchange data directly with each other using P2P mechanisms, without going through the host.

Minor nitpicking: A K80 does not comprise two K40s. The K40 uses a GK110, an sm_35 device, while the K80 uses two GK210, sm_37 devices.

Also, worth noting changes in GK210 memory subsystem. From Anandtech,
http://www.anandtech.com/show/8729/nvidia-launches-tesla-k80-gk210-gpu

Doubled registers and shared.mem makes the K80 (2 x GK210) a much more interesting card!