How NVLink Will Enable Faster, Easier Multi-GPU Computing

jwitsoe · November 14, 2014, 2:18am

Originally published at: https://developer.nvidia.com/blog/how-nvlink-will-enable-faster-easier-multi-gpu-computing/

Accelerated systems have become the new standard for high performance computing (HPC) as GPUs continue to raise the bar for both performance and energy efficiency. In 2012, Oak Ridge National Laboratory announced what was to become the world’s fastest supercomputer, Titan, equipped with one NVIDIA® GPU per CPU – over 18 thousand GPU accelerators. Titan…

sajad.karim · December 10, 2014, 2:51pm

Hi,

I have a query regarding “NVLink Signaling and Protocol Technology” and particularly regarding Atomic operations via NVLink.

“The protocol uses a variable length packet with packet sizes ranging from 1 (simple read request command for example) to 18 (write request with data for 256B data transfer with address extension) flits” (https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf - Page 35)

From the above text I presume NVLink uses the same protocol to perform an atomic-operation on a peer GPU. I could not find any whitepaper or reference that describes the protocol or packets’ in detail. Could you please confirm if my assumption is true? If not then could you please shed some light on how NVLink performs an atomic-operation?

Thank you in advance.

gjubran56 · December 12, 2014, 6:36pm

Hello,
I am very interested in learning and incorporate the advantage(s) of NVLINK within the context of multi GPU applications and algorithms. Very recently I have build my first deep learning workstation with two Geforce RTX 2080TI and hopefully will upgrade to three or four GPUS down the road. I started implementing and benchmarking platforms (Resnet, CNNs etc) and have noticed that when the two GPUS are connected via an NVLINK bridge, the two GPUs are considered as one device only when i import mult_gpu_model(model, num_gpu) from keras.tensorflow models, execution of the code is halted with an error due to only one gpu is available , not two.
So my question here is how can I go from here and why the system “sees” only one device , not two?. The NVIDIA platforms sadly dont offer much support regarding this, and for the most part, expand on multi gpus without NVLINK methodologies. Furthermore, the nvlink-smi displays the two gpus and as such, one would conclude that both devices should be visible to the system when executing the CNN models from tensorflow and keras. I definitely would like to pursue this matter and expose the reason(s) . HEre is the details for my platform: WSL2 ubuntu 20.04 within windows10, AMD threadripper 1950x CPU, 128G DDR4 RAM, Jupyter Notebook with tensorflow 2.3.0 , python 3.7.8
thank you,
George Jubran

anon16919743 · September 17, 2015, 11:51pm

Are we talking the release of this along with Pascal sometime in 2016?

I would love to hear more on who and what are going to be "behind" this particular hardware. This is something that warranties a possible revamp of the DeskStar!!

anon53448141 · October 4, 2015, 12:43pm

So, Pascal card, it looks to be compatible with Pci-e slots (8? 16x?) but limited about 25% compare to using it in a NVLink. To say then, MBoards need updating and while it is stated 'servers' will best be accommodated, I've no idea what chipset design will be necessary to handle a true NVLink for the Pascal Vid card.

anon16919743 · October 23, 2015, 8:15pm

Consumer level.....?......? When....???

anon49557724 · November 8, 2015, 11:55pm

Generally, HBM2 memory living on a graphics card should be accessible
by the CPU and even unified. At least, it should be possible with the
new cards in 2016, what with their estimated 32GB of HBM2 memory, to
dispense with traditional motherboard system ram and allow the processor to
directly access the graphics card memory as its own with all the
advantages of the significantly increased bandwidth, where perhaps DDR4 / 5 mobo memory acts as a swap out before hitting the disk swap file. NVLink does not say anything about this.

NVlink is proprietary, much like G-Sync, this is utterly unattractive. Purely an interim solution before an open standard hits the shelves. This also paints you into a corner with regard to upgradability and component choice, whilst coupling you to NVLink enabled hardware carrying the NVidia price premium.

Bad times. All this amounts to an attack on AMD and indeed, Mobo manufactures et'al. Just as is the case with G-Sync, NVidia hope to wreak havoc on the standards based proven model currently in place by imposing costly intellectual property licencing and hardware costs upon anyone who wishes to utilise NVlink, this includes AMD. Since if they want to compete, they will simply have no choice. Very bad times indeed, boycott this proprietary IP which is terrible for innovation and competition.Freesync and PCIe 4 are where we want and need to be in 2016 onward.

If NVidia really care about the PC industry, they could merge the PCIe4 spec with an enhanced option which utilities the NVLink IP without licensing costs. So named, PCIe4-enhanced.

anon40213719 · February 4, 2016, 11:05pm

Storm Lake is for communication between nodes, so it competes with Infiniband. NVLink is for communication within a node. It would be helpful to both Nvidia and AMD if AMD includes NVLink on their CPUs and GPUs. Otherwise, AMD will have to develop AMDLink and there will be no way to connect an Nvidia GPU to an x86 CPU with this type of interconnect.

anon9181423 · February 8, 2016, 12:13am

Now is a good time for disruption in the personal computer market because a lot of people are unhappy with Windows 10. I hope Nvidia has some of their software engineers contributing to Linux desktop projects because that will increase the market for Nvidia's CPU. In addition to that, I bet there is a startup developing a new commercial operating system for desktops and notebooks. Nvidia should help fund them.

A CPU with NVLink will be great. Nvidia should license the Mill CPU or acquire the company Mill Computing Inc. The Mill CPU is a truly impressive new CPU design. When only one thread is being used, Intel's Haswell-E runs at its turbo frequency of 3.6 GHz. A good target for Nvidia's CPU would be at least 1.5x or 2x of Intel's single-thread SPECint performance.

anon96011494 · June 15, 2016, 6:20pm

At this time, NVLink is used only for GPU-to-GPU communication, with the only exception of IBM Power processors. At the consumer level you are unlikely to buy multiple Pascal GPUs per host, and x86 machines don't "speak" NVLink, so you won't have access to NVLink.

anon16919743 · June 15, 2016, 10:19pm

Som'bitch...... Thank you though....

Topic		Replies	Views
NVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data Technical Blog	14	596	March 31, 2016
Inside Pascal: NVIDIA's Newest Computing Platform Technical Blog	51	855	December 8, 2017
NV-Link cloud computing - Pascal GPU <-> CPU CUDA Programming and Performance	0	741	January 11, 2016
Tesla C2050-based Supercomputer ranked #2 in the world! CUDA Programming and Performance	27	24495	June 27, 2010
Multi-GPU: A must in HPC? CUDA Programming and Performance	10	8582	February 10, 2010
Fermi? Sounds interesting... CUDA Programming and Performance	58	15636	October 18, 2009
Optimal multi-GPU system CUDA Programming and Performance	7	1141	September 6, 2017
CUDA hardware & software CUDA Programming and Performance	9	2698	November 13, 2010
Kepler and Maxwell, oh my! CUDA Programming and Performance	55	55813	October 19, 2010
Linux SLI and NVlink Linux	30	5280	April 19, 2020

How NVLink Will Enable Faster, Easier Multi-GPU Computing

Related topics