How NVLink Will Enable Faster, Easier Multi-GPU Computing

Originally published at:

Accelerated systems have become the new standard for high performance computing (HPC) as GPUs continue to raise the bar for both performance and energy efficiency. In 2012, Oak Ridge National Laboratory announced what was to become the world’s fastest supercomputer, Titan, equipped with one NVIDIA® GPU per CPU – over 18 thousand GPU accelerators. Titan…


I have a query regarding “NVLink Signaling and Protocol Technology” and particularly regarding Atomic operations via NVLink.

“The protocol uses a variable length packet with packet sizes ranging from 1 (simple read request command for example) to 18 (write request with data for 256B data transfer with address extension) flits” ( - Page 35)

From the above text I presume NVLink uses the same protocol to perform an atomic-operation on a peer GPU. I could not find any whitepaper or reference that describes the protocol or packets’ in detail. Could you please confirm if my assumption is true? If not then could you please shed some light on how NVLink performs an atomic-operation?

Thank you in advance.

I am very interested in learning and incorporate the advantage(s) of NVLINK within the context of multi GPU applications and algorithms. Very recently I have build my first deep learning workstation with two Geforce RTX 2080TI and hopefully will upgrade to three or four GPUS down the road. I started implementing and benchmarking platforms (Resnet, CNNs etc) and have noticed that when the two GPUS are connected via an NVLINK bridge, the two GPUs are considered as one device only when i import mult_gpu_model(model, num_gpu) from keras.tensorflow models, execution of the code is halted with an error due to only one gpu is available , not two.
So my question here is how can I go from here and why the system “sees” only one device , not two?. The NVIDIA platforms sadly dont offer much support regarding this, and for the most part, expand on multi gpus without NVLINK methodologies. Furthermore, the nvlink-smi displays the two gpus and as such, one would conclude that both devices should be visible to the system when executing the CNN models from tensorflow and keras. I definitely would like to pursue this matter and expose the reason(s) . HEre is the details for my platform: WSL2 ubuntu 20.04 within windows10, AMD threadripper 1950x CPU, 128G DDR4 RAM, Jupyter Notebook with tensorflow 2.3.0 , python 3.7.8
thank you,
George Jubran

Are we talking the release of this along with Pascal sometime in 2016?

I would love to hear more on who and what are going to be "behind" this particular hardware. This is something that warranties a possible revamp of the DeskStar!!

So, Pascal card, it looks to be compatible with Pci-e slots (8? 16x?) but limited about 25% compare to using it in a NVLink. To say then, MBoards need updating and while it is stated 'servers' will best be accommodated, I've no idea what chipset design will be necessary to handle a true NVLink for the Pascal Vid card.

Consumer level.....?......? When....???

Generally, HBM2 memory living on a graphics card should be accessible
by the CPU and even unified. At least, it should be possible with the
new cards in 2016, what with their estimated 32GB of HBM2 memory, to
dispense with traditional motherboard system ram and allow the processor to
directly access the graphics card memory as its own with all the
advantages of the significantly increased bandwidth, where perhaps DDR4 / 5 mobo memory acts as a swap out before hitting the disk swap file. NVLink does not say anything about this.

NVlink is proprietary, much like G-Sync, this is utterly unattractive. Purely an interim solution before an open standard hits the shelves. This also paints you into a corner with regard to upgradability and component choice, whilst coupling you to NVLink enabled hardware carrying the NVidia price premium.

Bad times. All this amounts to an attack on AMD and indeed, Mobo manufactures et'al. Just as is the case with G-Sync, NVidia hope to wreak havoc on the standards based proven model currently in place by imposing costly intellectual property licencing and hardware costs upon anyone who wishes to utilise NVlink, this includes AMD. Since if they want to compete, they will simply have no choice. Very bad times indeed, boycott this proprietary IP which is terrible for innovation and competition.Freesync and PCIe 4 are where we want and need to be in 2016 onward.

If NVidia really care about the PC industry, they could merge the PCIe4 spec with an enhanced option which utilities the NVLink IP without licensing costs. So named, PCIe4-enhanced.

Storm Lake is for communication between nodes, so it competes with Infiniband. NVLink is for communication within a node. It would be helpful to both Nvidia and AMD if AMD includes NVLink on their CPUs and GPUs. Otherwise, AMD will have to develop AMDLink and there will be no way to connect an Nvidia GPU to an x86 CPU with this type of interconnect.

Now is a good time for disruption in the personal computer market because a lot of people are unhappy with Windows 10. I hope Nvidia has some of their software engineers contributing to Linux desktop projects because that will increase the market for Nvidia's CPU. In addition to that, I bet there is a startup developing a new commercial operating system for desktops and notebooks. Nvidia should help fund them.

A CPU with NVLink will be great. Nvidia should license the Mill CPU or acquire the company Mill Computing Inc. The Mill CPU is a truly impressive new CPU design. When only one thread is being used, Intel's Haswell-E runs at its turbo frequency of 3.6 GHz. A good target for Nvidia's CPU would be at least 1.5x or 2x of Intel's single-thread SPECint performance.

At this time, NVLink is used only for GPU-to-GPU communication, with the only exception of IBM Power processors. At the consumer level you are unlikely to buy multiple Pascal GPUs per host, and x86 machines don't "speak" NVLink, so you won't have access to NVLink.

Som'bitch...... Thank you though....