Need some information on driver streams and CUDA

Can you please explain or point to a resource on the long-lived/short-lived drivers? How do we learn when one has been promoted from short-lived to long-lived, rather than just someone having made a mistake on the site? Also, how should one figure out what driver version is the minimum for a release of CUDA?

375 appears to no longer be the long-lived driver, despite just having seen it listed a few weeks ago, and 384 is nowhere to be found now. Is there some sort of development cycle I could look at to help me make good choices and provide information to our user community about what we’ll be doing when?

I have a system where I have to maintain compatibility with range of driver cards and compiled software, eg. M2070 to V100, software compiled with CUDA 7.x-9.x. We also can’t constantly be manipulating NVIDIA driver versions. Is there a way for me to find out what’s recommended to provide reasonably good compatibility and a reasonably stable driver?

Take the runfile installer for that version, and inspect what driver is contained in it. That is the minimum. If you would like a historical summary, you can find one here:

https://stackoverflow.com/questions/30820513/what-is-version-of-cuda-for-nvidia-304-125/30820690#30820690

This may also be of interest:

https://stackoverflow.com/questions/28932864/cuda-compute-capability-requirements/28933055#28933055

In my experience, advance notification of future development cycles are released under NDA only.

Within certain ranges, the “latest driver” should support any previously available GPU. Furthermore, the “latest driver” is usually recommended to pick up any bug fixes. Furthermore, the “latest driver” will be compatible with all previous CUDA toolkits, within range limits.

CUDA 9/9.1 dropped support for Fermi devices (compute capability 2.x, including the mentioned M2070). Therefore if you want to provide that range of support, it’s going to be more challenging.

Anything back through Kepler (cc 3.x and higher) is supported properly by current drivers (e.g. R390) and latest CUDA toolkits (9.0/9.1).

Machines with Fermi (or prior) devices should probably be provisioned separately. All other scenarios should be supportable from latest driver/toolkit combo, and latest driver will work with any cuda toolkit in the range you mention (7.0 - 9.1) using Kepler or newer devices.

As a final note, for machines updated to newer linux kernel versions (generally speaking, beyond 4.10), these machines will generally require a R390 driver or newer.

Thanks, txbob. Some followups:

  1. Can you also go into the long-lived and short-lived driver? Is there a sort of ball-park figure for how often annually a long-lived driver changes, and what in-general, is meant by long-lived vs. short lived and where to be notified about a change in the long-lived driver?

  2. As far as what to look out for before upgrading either the CUDA toolkit or the driver, it would just be the README in either package to make sure that hardware isn’t deprecated?

  3. If code is compiled against CUDA 8.0, for example, and someone loads the CUDA 9.0 toolkit and tries to run on a CC 2.0 card, will this fail? I guess I’m asking if there’s a difference between compile time and runtime requirements.

To make sure I’m clear on the rest:

  1. Code compiled against older versions of CUDA should continue to work with newer drivers, provided the driver supports the card. So, for example, since even driver 390 supports M2070, I should not have any problems with any software compiled in the past or going forward against CUDA earlier than 9.0 on those cards? Is there some point at which a driver becomes too new for a version of CUDA, or just for the hardware?

  2. Any code compiled against CUDA >= 9.0 will simply not work on any GPUs with less than CC 3.0.

Thanks for helping to clarify.

I’m not allowed to make forward looking statements: what to expect in the future. That is not the purpose of these forums, generally. You’re allowed to ask whatever you wish, I’m just not allowed to respond to every question. In general, short-lived drivers will pick up most changes (enhancements, bug fixes) that are in view of the development team. The long-lived drivers are intended to be more “stable” and will only pick up certain changes that meet a set of criteria. I’m not allowed to detail this out any further. AFAIK, the way to get push notifications from NVIDIA are:

  1. subscribe to our blogs http://blogs.nvidia.com
  2. become a registered developer http://developer.nvidia.com
  3. subscribe to newsletters http://www.nvidia.com/object/newsletter.html

I’m not sure any of these provide a push notification every time a new driver is released.
Regarding driver notices, you should be able to subsribe for example to this board:

https://devtalk.nvidia.com/default/board/99/

and get notices of various linux driver releases.

I would refer you to the release notes in either case. When hardware gets deprecated by a CUDA toolkit, the CUDA tookit with the deprecation usually provides a splashy notice (a warning every time you compile code for that architecture.) If you haven’t noticed, somebody is not paying attention.

I would expect code compiled for CUDA 8.0 to work on any machine that has a CUDA 8.0 or newer driver, assuming the driver installed supports the GPU installed, and assuming the code has been compiled for that target architecture, and assuming the code does not have any unmet dynamic/runtime link dependencies on e.g. the CUDA runtime library. If you have a CUDA 8.0 binary compiled for a CC 2.0 device, and you have statically linked against the CUDA 8 libraries (or provide them in a redistributed fashion), your code should run even if CUDA 9 toolkit is installed on that machine.

Yes, your statement is correct, with a wrinkle: CUDA 8 tools (e.g. debugger, profiler, cuda-memcheck, etc.) may not work correctly on a CUDA 9 driver on cc 2.0 hardware. But your compiled binary should work, subject to the limitations previously mentioned. Yes, there are points at which a driver becomes too new for a particular piece of hardware. AFAIK R343 drivers were the last to support cc1.x devices, and we might see in the future that a future branch does not support cc 2.x devices, for example. GPUs are not supported “for ever”.

correct

Thanks, txbob. I’ve noticed somewhat frequent mentions in these forums about is not allowed to be shared. I don’t want to give you a hard time, but I figure you probably have a line to the people who are making the decision about what you’re allowed to say. Speaking for myself, I just want to support the hardware that we’ve purchased that I believe is under warranty, and to be able to plan our next move for our organization. I don’t think we’re coming anywhere close to NDA territory there, and I don’t know of any other company we do business with that would ever bring up an NDA in the context of wanting to know what a short-lived vs. long-lived driver is. Is there another support mechanism I should be using instead of the forum? This is sort of where I ended up when I clicked through the various “this is what I need help with” links. Is there also a support/ticketing system where customers can ask questions?

That said, my questions are basically answered, so far as what driver to choose.

I suspect we have a signed NDA anyway, and I can investigate this stuff through that channel as well.

You can pay for enterprise support.

http://www.nvidia.com/object/enterprise-services.html

This is unpaid/community support you are getting here.

The ticketing system for defects is by filing a bug at developer.nvidia.com. After becoming a registered developer, you can click on your name in the upper right hand corner, and navigate to your account page where you can see bugs that you have filed and file new ones.

For general “I Have a question” where you don’t want to use community support (i.e. this forum) you can use this portal:

https://developer.nvidia.com/contact

The NDA comment was in reference to what to expect for a release schedule for Long-term driver drops, specifically this question of yours:

how often annually a long-lived driver changes

That is a forward looking statement: it sets an expectation for future delivery of something. I cannot go there. Sorry. You’re welcome to take a look at the date-stamped releases of drivers in a long-lived branch, and arrive at your own historical estimate of an answer to this question. I am not allowed to set expectations for anything in the future.

I’ve noticed you’ve cross-posted the question in the linux graphics forum. That’s probably a good place for linux driver-specific questions. I don’t have any more precise description of long-lived vs. short-lived available to me at the moment, but the moderator there may have a more precise description. Not sure.

Thanks for the clarifications on the types of support – much appreciated. Can you confirm that no period of enterprise support is provided with the purchase of hardware – it’s always paid, from day 1?

Also, sorry, two final specific questions related to current driver versions and then I think I’m out of your hair! :-D

  1. If I go to this page: http://www.nvidia.com/object/unix.html …I see that the latest long-lived driver is 390.48. If I click on that driver, the “supported products” lists only consumer-class hardware. If I go to the driver search and choose Tesla, P-Series, P100, Linux 64-bit RHEL7, CUDA 9.1, I get 390.46. Is this just somehow not up to date yet, or do they release slightly different point releases for the consumer cards vs. the enterprise cards?

  2. If I go to the drivers search page and select Tesla, M-Class, M2070, Linux 64-bit RHEL7, the choices are CUDA 8.0, 9.0, and 9.1. Shouldn’t this not list 9.x at all? It does seem to filter different CUDA versions based on what sort of hardware, etc. you select.

correct, unless at time of purchase you have negotiated something. That is rare, in my experience. The simple act of purchase of tesla hardware does not provide you with any automatic paid or enterprise support - as defined by the web page I previously linked.

For tesla hardware, the correct method to choose the “latest driver” that I referred to previously, is to use the wizard. The wizard is where you select Tesla, P-series, P100, …
Having said that, the differences between 390.46 and 390.48 are likely to be quite isolated (although they are not zero - there is something in there that matters to someone). Beyond that, I wouldn’t be able to describe the differences beyond what is published with the release notes. NVIDIA releases drivers according to a variety of needs. I won’t be able to give an exhaustive treatment of that here.

Yes, I agree, it should not list 9.0 and 9.1, for Fermi-class hardware (any cc2.x device). However there is a bit of a gray area here as R390 drivers do support Fermi, and they also do support CUDA 9.0/9.1. The fundamental restriction on Fermi in CUDA 9.0/9.1 comes about via the CUDA toolkit, not the driver.

OK, thanks again, txbob.