Linux PCI-E 3.0 Support

Hello,

I sent a request to NVIDIA support about this issue a few months back but there has not been a resolution yet to the problem that I know of. I provided bug report logs as was requested at the time.

I use my Kepler 680 cards for compute purposes (CUDA/OpenCL) and the PCI-E bandwidth is very beneficial for this use. With driver 295.33, all my slots negotiate at 8 GT/s per Rev 3.0 spec with several different boards that I have which is working fine currently. Any drivers after 295.33 are capped at 5 GT/s as reported by lspci and nvidia-settings. I have tried all drivers up to 310.14 beta and all are limited to 5 GT/s.

Are there any plans to fix this issue? I would like to be able to upgrade my driver to a newer version at some point and this will be necessary if I am ever to buy any future released NVIDIA cards.

Thanks for your time.

Screenshots of nvidia-settings:

295.33: https://docs.google.com/open?id=0B3fM8POXHOrVYkctcXZfMGRJSVk
304.10: https://docs.google.com/open?id=0B3fM8POXHOrVaVJVSEc0OWNwYU0

And you see a drop in performance too?

Yes, the performance drop is approximately 15% when going from 8 GT/s to 5 GT/s with the application I primarily run.

I recently found a handy tool in the NVIDIA CUDA Toolkit called bandwidthTest. I compiled that tool via CUDA Toolkit 4.2.9. The version I compiled with CUDA Toolkit 5.0.35 required at least driver 304.54 and because of that I used the 4.2.9 build instead.

Here are the results between 5 and 8 GT/s using driver 295.33 and drivers that came after that version.

5 GT/s - Size: 33554432 bytes

Host to Device Bandwidth: 5990.8 MB/s
Device to Host Bandwidth: 6396.6 MB/s

8 GT/s - Size: 33554432 bytes

Host to Device Bandwidth: 11712.8 MB/s
Device to Host Bandwidth: 12129.4 MB/s

The bandwidth measurement is nearly double with driver 295.33.

You’re right that PCIe 3.0 was disabled because of reliability issues with the current generation of PCIe controllers, such as the one in the X79 chipset platform. There exists an NVIDIA tool on Windows to opt-in to unsupported PCIe 3.0 support, but there isn’t a Linux equivalent to that tool yet. I filed an enhancement request and will update that thread when there’s news.

Thanks,

  • Pierre-Loup

Thanks for the information and for putting in a request. Is there a link to the site where the requests are filed? Otherwise if you could post an update in this thread when available, I would appreciate it.

I also wanted to mention that the PCI-E 3.0 support with the one driver that supports it has been running stable via my systems and the Kepler cards with 7-months of full load on the GPUs. I have not run into any reliability issues that I could find.

The request number is 1171212; there is no publicly accessible tracker but I will update this thread when the bug gets resolved.

where can we see the status of this ?
I really want pcie 3 enabled

Hi,

My system has the following specs:

  • Intel Core i7-3820
  • Gigabyte GA-X79-UD5
  • 2x GeForce GTX 680 (Gigabyte GV-N680OC-2GD)
  • CentOS 6.3 (64 bits)
  • Cuda toolkit 4.2

310.14 driver and pinned memory:
Host to Device Bandwidth: 5985.9 MB/s
Device to Host Bandwidth: 6390.9 MB/s

295.33 driver and pinned memory:
Host to Device Bandwidth: 11192.6 MB/s
Device to Host Bandwidth: 12178.8 MB/s

I will do further tests with my own codes.

On a similar note, how is SLI support these days? I remember people reporting that it was very minimal performance gain, or even a performance loss. Is this still the case?

Thanks for sharing your bandwidth results. Those numbers are very similar to what I have seen on two of my systems. I hope that NVIDIA can resolve this issue and remove the restriction.

I have not tested this in a long time. I think the last time I had SLI enabled in Linux was with a pair of 6800GT or 8800GT cards. Unfortunately, there were few apps available at the time to take advantage of the SLI. Now that Steam is pushing to port games to Linux, SLI should be of greater benefit for Linux users.

True. I’m mainly considering this as I am thinking of getting a 1440p monitor, but I don’t know if performance will be dropped too severely with a single 660. I’m mainly playing games under wine, and in that case it’s quite possible that the CPU bottleneck would be too great, which would mean that SLI wouldn’t help much even if it’s working correctly. Additionally this would mean that, assuming there is enough video ram, 1440p probably won’t have a huge additional performance hit. But it’s hard to say, without actually testing such a configuration. I’ve found the relationship between CPU, GPU, and video settings is often less obvious on linux (with wine) in comparison to windows.

I agree that working SLI would be great for native games at least, provided that they are heavy enough on the GPU side to drop below 60fps. I’m not sure that any of the current Source games would do that actually with a 660 or greater system, but perhaps future games will.

A future driver release will add a kernel module parameter, NVreg_EnablePCIeGen3=1, that will enable PCIe gen 3 when possible. Please note that this option is experimental and many systems are expected to have stability problems when this option is enabled. Use it at your own risk.

Aaron,

I loaded driver 310.32 with kernel 3.7.5 and my cards are negotiating at PCI-E 3.0. So far everything is looking good. I ran bandwidthTest and the numbers are similar to driver 295.33.

Thank you very much for adding this option to the latest Linux NVIDIA drivers. I very much appreciate this. The CUDA application I am running performs the best with PCI-E 3.0 enabled. I also appreciate that the option can be enabled as a kernel module parameter. I am running a small busybox based image that does not have X and this makes it so that I can still load PCI-E 3.0 support and run my CUDA apps without having to add X.

As Aaron stated, the nvidia driver option doesn’t work everywhere unfortunately.
I just bought a system which was supposed to be a Gen3 GPU compute machine.
It has an Intel W2600CR motherboard with dual E5-2620 and 4 x ASUS GTX 670 cards.
None of them are recognized as Gen3.
I tried the latest Nvidia driver, and applying the “NVreg_EnablePCIeGen3=1”, but when I run a simple “deviceQuery”, the system crashes and reboots.
If I run deviceQuery without this option enabled, everything is fine.
I’m running RHEL63.
Are there any other server motherboards that would take the dual E5-2620 CPUs and 4 x ASUS GTX 670 and guaranteed run at proper Gen3 speed?
It seems that the whole Gen3 thing is a “scam” when it comes to these boards. If you’re going to say that your board supports Gen3 (like the Intel board does), then it should, and if it doesn’t, it shouldn’t be able to say that it does because in the end, it’s the users that get caught in between the mess.

Unfortunately, I have not worked with any of the the dual socket server boards. However, I know Supermicro has an extensive line of boards supporting the E5-2600 and a few boards supporting E5-4600 processors. It may be worth checking with them to see if they have done the necessary testing to make sure PCI-E 3.0 support works. One board you may want to look at is the X9DRG-QF. This particular board appears to support 5 dual slot cards and Supermicro designed it such that the board can take advantage of all 80-lanes from both processors plus an additional 4 PCI-E 2.0 lanes connected to the unused DMI link of the second CPU. I very much like the layout of this particular board and hopefully the board is designed to handle the PCI-E 3.0 spec correctly. ASUS also has a dual socket 2011 board supporting four dual slot cards called the Z9PE-D8.

@kangaroo2013

Does your BIOS for that mobo have an option to enable PCIE3.0 speeds? That might be why it’s crashing, but it could just be poor support. Just a thought if you hadn’t tried it already.

I wanted to add that this module kernel option flag worked for me. See below:
https://devtalk.nvidia.com/default/topic/533200/linux/gtx-titan-drivers-for-linux-32-64-bit-release-/post/3753244/#3753244