I have six K80 GPUs. In the output of deviceQuery I see that some K80s can access one another, others cannot. When I had four, all K80s had peer access. After I put in five and six, suddenly GPUs 1 and 2 only have peer access with each other, not with 3 through 6.
The card with GPUs 3 and 4 is right next to the card with GPUs 5 and 6 (let’s call it “left of the CPUs”). The card with GPUs 1 and 2 is separate (“right of the CPUs”.). Could that be the reason?
Still on CUDA 11.0, not 11.1, but I doubt that makes a difference for deviceQuery.
No SLI used (should I? The SLI connectors are cheap these days)
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU1) : No
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU2) : No
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU3) : No
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU4) : No
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU5) : No
Peer access from GeForce GTX 980 (GPU0) → Tesla K80 (GPU6) : No
Peer access from Tesla K80 (GPU1) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU2) : Yes
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU1) → Tesla K80 (GPU6) : No
Peer access from Tesla K80 (GPU2) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU1) : Yes
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU3) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU4) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU5) : No
Peer access from Tesla K80 (GPU2) → Tesla K80 (GPU6) : No
Peer access from Tesla K80 (GPU3) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU3) → Tesla K80 (GPU6) : Yes
Peer access from Tesla K80 (GPU4) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU5) : Yes
Peer access from Tesla K80 (GPU4) → Tesla K80 (GPU6) : Yes
Peer access from Tesla K80 (GPU5) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU5) → Tesla K80 (GPU6) : Yes
Peer access from Tesla K80 (GPU6) → GeForce GTX 980 (GPU0) : No
Peer access from Tesla K80 (GPU6) → Tesla K80 (GPU1) : No
Peer access from Tesla K80 (GPU6) → Tesla K80 (GPU2) : No
Peer access from Tesla K80 (GPU6) → Tesla K80 (GPU3) : Yes
Peer access from Tesla K80 (GPU6) → Tesla K80 (GPU4) : Yes
Peer access from Tesla K80 (GPU6) → Tesla K80 (GPU5) : Yes
Is this a system with dual CPUs, or a single CPU that comprises two physical chips internally? To my knowledge, peer access requires that the GPUs (which are PCIe endpoints) are attached to the same PCIe root complex. It appears in this system there are two root complexes, each connected to half of the PCIe slots. The PCIe root complex is part of the non-core circuitry of the CPU, so two CPUs means two root complexes.
There is software (or a system command) that shows the PCIe topology but the name escapes me at the moment.
That’s probably the reason, two root complexes. It’s a ProLiant ML350 gen 8, rack version (not tower version). It has two CPUs, each of which has six cores, so 12 in total, 24 with hyperthreading. The manual doesn’t mention anything about a root complex, but it does say that half the memory bank are associating with one CPU and the other half with the other CPU. When I first had one K40 and one of the other other, GPUs 1 and 2 had peer access, 3 and 4 had peer access, but no peer access across. When I had two K40s on the “left” root complex, all four had peer access. Put two more on the “right” side, 1 and 2 have, 3 through 6 have, but not across (1,2) x (3,4,5,6). Thank you for pointing me to root complexes, never heard of that before.
On my SLI question, it seems SLI only provides shared memory access, it doesn’t help anything with the compute speed, and as each K80 GPU has 12 GB memory, I run into compute speed problems before I run into memory problems. Some compute-intensive tasks run faster on my dinky 980 (Maxwell) than on a K80 (Kepler). Time to upgrade, used T4s (Turing) are declining fast on ebay, now that Ampere is out. And they only take one slot, conceivably I could put six T4s in the ProLiant and still have a graphics-capable GPU in there as well.
Correct, SLI doesn’t help with anything compute related. It’s a low-bandwidth connection primarily used for frame synchronization in 3D graphics as far as I know.
Some recent high-end GPU have an NVlink connector, which is a low-latency high-bandwidth interconnect relevant to GPU compute and is used for peer-to-peer communication best I know.
As I recall (vaguely), there was a huge jump in efficiency between the Kepler and Maxwell architectures and the GTX 980 was a high-end consumer card, so your observation about relative performance doesn’t strike me as odd. The K80 should still have the upper hand over the GTX 980 in double-precision computation.
Since NVIDIA put a notice in the CUDA 11 release notes stating that support for the remaining Kepler as well as Maxwell GPUs is deprecated, anybody looking to deploy second-hand GPUs would want to look at GPUs using Pascal or later architectures. I have not used a T4 (haven’t even seen a physical T4), so cannot comment on pros and cons.
Then why does the K80 even have SLI connectors? As SLI is only for graphics / frame rates, and the K80 doesn’t even have any graphics output (display connectors), what’s the point of the SLI connectors?
K80 is ancient history, I do not recall what it looked like in detail. I do not recall it having an SLI connector. If you see some connector that looks like an SLI connector it may not actually be one, or at least not a functional one.
No idea, but it is not an SLI connector. I checked all NVIDIA documentation on the K80 that I could find online and there is no mention of an SLI connector in it anywhere. As you stated yourself, for a head-less GPU that would not make sense.