Overview of K80 Architecture

grynet · July 13, 2015, 9:41am

Hi all,

I haven’t understood k80 architecture. As i see, there are two gpu (GK210) and two memory area 2x12GB.
My questions are

When I have kernel which is able to occupy entire k80, is it going to work on two gpu?
Who is managing these two memory area? (It looks like kind of numa architecture?
If hardware or cuda manages everything, what if some part of my kernel wants to access data from both memory area ?
If I launch dynamic parallelism, is hardware or cuda going to take account data locality?(maybe this is no :) )
Shared memory is bigger. Does it reside distributed on two gpu?
What's difference between using 2xK40 and 1 K80 ?

Thanks in advance

Robert_Crovella · July 13, 2015, 2:00pm

When I have kernel which is able to occupy entire k80, is it going to work on two gpu?

a single kernel will not occupy both GK210 devices. The K80 GPU appears to the system much like 2 K40 devices would. To make full utilization of the K80, you must have a multi-GPU aware application.

Who is managing these two memory area? (It looks like kind of numa architecture?

Each GK210 device is independent and independently manages it’s own 12GB memory.

If hardware or cuda manages everything, what if some part of my kernel wants to access data from
both memory area ?

Memory from one GK210 is visible to the other if a mechanism like P2P is used, just as it would be if two K40 devices were used.

If I launch dynamic parallelism, is hardware or cuda going to take account data locality?(maybe this is no :) )

CUDA dynamic parallelism runs on a single GPU device, so in the case of K80 that would be a single GK210 device. It does not automatically use the resources of the other GK210 device.

Shared memory is bigger. Does it reside distributed on two gpu?

Each GK210 device (compute capability 3.7) has an expanded shared memory system. The shared memory of one GK210 is not usable by the other.

What’s difference between using 2xK40 and 1 K80 ?

Conceptually, the usage is quite similar from a programmer’s perspective.

grynet · July 14, 2015, 12:29pm

Thank you for answer txbob, they are very useful. In short, K80 will be managed same way of multi gpu.
However, is there a any performance changing/improvement for peer-2-peer data transfer? Will it use PCI ?

Robert_Crovella · July 14, 2015, 1:30pm

Peer to Peer still uses the PCIE connection between the 2 GK210 devices. It happens to be a PCIE Gen3 link.

grynet · July 14, 2015, 1:42pm

thank you very much for these useful infromation txbob.

Rparakkal · November 4, 2015, 11:08am

txbob,

Thank you for your answers. Its very informative. I have a couple of additional queries

I assume that the k40s inside the k80, are directly connected through PCIE 3 and do not have to pass data through host. Am I correct ?
Is the connection between the 2 Titans inside the Titan Z very similar to this. Is there any fundamental difference in the connection between the 2 GPUs, inside the 2 devices(K80 and Titan Z) ?

Thank you in advance.

Robert_Crovella · November 4, 2015, 2:09pm

Yes, on both Titan Z and K80, there is a PCIE link (PLX PCIE switch) that directly connects the two GPU devices. It is possible for those two GPU devices to exchange data directly with each other using P2P mechanisms, without going through the host.

njuffa · November 4, 2015, 4:00pm

Minor nitpicking: A K80 does not comprise two K40s. The K40 uses a GK110, an sm_35 device, while the K80 uses two GK210, sm_37 devices.

nnunn · November 4, 2015, 7:36pm

Also, worth noting changes in GK210 memory subsystem. From Anandtech,

Doubled registers and shared.mem makes the K80 (2 x GK210) a much more interesting card!

Topic		Replies	Views
any special coding needed to utilize the 2 GK210 in 1 K80? CUDA Programming and Performance	0	574	October 28, 2015
any special coding needed to utilize the 2 GK210 in 1 K80? CUDA Programming and Performance	0	643	October 28, 2015
Use all 24 Gb for one application on the K80 GPU CUDA Programming and Performance cuda , tensorflow , pytorch , driver	1	2037	January 20, 2022
Run LLM in K80 CUDA Programming and Performance	3	8881	July 21, 2023
any special coding needed to utilize the 2 GK210 in 1 K80? CUDA Programming and Performance	0	669	October 28, 2015
Tesla K80 global bandwidth specs CUDA Programming and Performance	3	2912	November 18, 2014
Data transfer between two GPUs on Tesla K80 CUDA Programming and Performance	4	2917	December 17, 2014
Issues on P2P transfer with Tesla K80 CUDA Programming and Performance	4	2111	November 15, 2015
Using Tesla K80 as two Tesla K40 CUDA Programming and Performance	5	2705	February 20, 2017
PyTorch with nvidia K80? CUDA Programming and Performance	1	3301	March 9, 2018

Overview of K80 Architecture

Related topics