PCIE3 on Titans

sBc-Random · January 3, 2014, 12:36pm

I currently have what I believe to be a pcie3 compatible motherboard, running Ubuntu :

As well as 8 titans. All pci transfer diagnostics indicate that I’m still running on PCIE2. Is there any way to force the system to use PCI Gen 3?

vacaloca · January 5, 2014, 6:50pm

This should work:

[url]https://devtalk.nvidia.com/default/topic/533200/linux/gtx-titan-drivers-for-linux-32-64-bit-release-/post/3753244/#3753244[/url]

replace ‘nvidia-313’ with the name of the nvidia module on your system, For ubuntu it could be:
nvidia, nvidia-current, nvidia-xxx (where xxx is the 3 digit version number) Try modinfo followed by the previous names and you’ll know what the module name is when you get the output of the current parameters.

sBc-Random · January 6, 2014, 2:33am

Sweet!
With a bit of fiddling I got that working! Thanks!!!

Interesting though that it’s only reaching about 68% of peak (10659.7/15750):

dwidthTest$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GTX TITAN
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			10659.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			10649.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			219901.0

Result = PASS

pszilard · January 7, 2014, 8:23pm

That’s quite similar to the performance I measured, around 11300 MB/s, on a Supermicro X9DRG (C602 chipset) and a Gigabyte GA-Z87X-OC (Z87 chipset), both with TITAN.

sBc-Random · January 8, 2014, 2:18am

Looking deeper into it, it seems that cuda isn’t fully utilizing all of the pci lanes on my motherboard (have 8 cards on 16 pci lanes each)

Trying to do a ring of transfers : 0>1 1>2 2>3 3>4 4>5 5>6 6>7 7>0 results in:
cudaMemcpyPeer / cudaMemcpy bandwidth per gpu: 1.24GB/s

A partial transfer 0>1 2>3 4>5 6>7 gives
cudaMemcpyPeer / cudaMemcpy bandwidth per gpu: 2.33GB/s

Then 0>1 4>5 gives
cudaMemcpyPeer / cudaMemcpy bandwidth per gpu: 5.51GB/s
(these are on 2 completely separate Pci branches AND a separate cpu controls each transfer so they’re completely independent)

And 0>1 by itself
cudaMemcpyPeer / cudaMemcpy bandwidth per gpu: 11.80GB/s

In theory it should be no different to the 0>1 1>0 transfer bandwidth = ~10.5GB/s

Edit: attached a diagram of the motherboard setup. It should become obvious that 0>1 and 4>5 transfers have nothing to do with each other, and so should not be at all slowed down by one another.

Topic		Replies	Views
GTX Titan Win7 x64 gets PCIe 2.0 speed :( CUDA Setup and Installation	8	5699	April 30, 2013
GTX Titan drivers for Linux 32/64 bit release? Linux	7	8983	March 6, 2013
Enabling PCIe 3.0 with NVreg_EnablePCIeGen3 on Titan Linux	11	11005	December 29, 2017
Enable PCIe 3.0 on TITAN with driver 352.63 Linux	0	1261	March 19, 2016
PCI Bandwidth CUDA Programming and Performance	4	876	January 12, 2018
PCI-E 3.0 possible on the K20c? CUDA Setup and Installation	7	3874	March 15, 2013
PCIE 3.0 on Linux? CUDA Programming and Performance	12	6280	March 1, 2017
Bandwidth speed to and from GPU about 1/4 of PCIe4 x16 on Debian (single GPU system) CUDA Programming and Performance	7	940	January 7, 2024
Weird bandwidth issues CUDA Programming and Performance	8	1354	December 1, 2016
20% of the bandwidth is missing CUDA Programming and Performance	4	1249	August 12, 2014

PCIE3 on Titans

Related topics