Different OptiX versions contra different graphic cards and different drivers

Hallo,
I realize, that application on Optix differen versions can run or can not run at different cards and drivers.
For example:
Card GeForce GT 630. Driver version 344.41.
Application on OptiX version 3.0.0. runs, and the same application on OptiX 3.6.3 runs too. OK!

Card GeForce GTX 745. Driver 340.52 WHQL.
Application on OptiX version 3.0.0. can not run, but the same application on OptiX 3.6.3 runs.

Card GeForce GTX 745. Driver 347.52 or other drivers with date 2015.
Application on OptiX version 3.0.0. can not run, and the same application on OptiX 3.6.3 can not run too.
Usually exception at optix::Context->launch().
How to manage this situation and sale the software product?
Can anyone give us advice?
Thanks.

Current OptiX versions can only support GPU architectures for streaming multiprocessor targets which were supported by the CUDA version that each OptiX version itself was built with.
Means applications which want to target a newer GPU architecture require a newer OptiX version and driver. Simply exchanging the OptiX DLL against a newer one inside an existing application is meant to work as long as the input PTX code isn’t touched and the DLLs have the same name (OptiX API backward compatible).

Though backwards compatibility is also limited to the streaming multiprocessor targets supported by the underlying CUDA drivers. Newer CUDA drivers discontinued supporting SM versions smaller than SM 2.0 (Fermi) and so do the most recent OptiX versions.

In your case:

  • GT 630 boards exist with Fermi and Kepler chips. You’d need to run the CUDA Toolkit deviceQuery example or OptiX’ sample3 which queries the GPUs SM version to see which architecture it is. OptiX 3.0.0 added support for Kepler GPUs and OptiX 3.6.3 also fully supports Fermi and Kepler GPUs so both work.
  • GTX 745 is based on the Maxwell GPU architecture which is supported in OptiX 3.6.3 but not in OptiX 3.0.0 because that was before Maxwell architectures existed.

“How to manage this situation and sale the software product?”

Currently you’re responsible to define the supported configurations of your application based on the OptiX version it was built with. Always use the newest OptiX version available.
I’m not sure if this finally changes with the next major OptiX version which uses a different compilation in its core. For more information on these changes have a look at the most recent OptiX presentations here: http://on-demand-gtc.gputechconf.com/gtcnew/on-demand-gtc.php

Also when developing commercial products, please make sure you read the OptiX licensing terms which changed with OptiX 3.5.0 and newer versions.

Thank you very much for answer.

Thanks for explaining this, very useful.

My application was running with no issue until driver update 358.50 and it’s making an exception on the first raytracing pass now. I’m working with Maxwell GPUs with Optix 3.6.3 and the exception is… 700 - Unknown error.

Is “driver non-regression testing” taking into account the OptiX frameworks features?
Any idea on how do to make sure to publish an application that won’t suddenly crash after a driver update?

Thanks!

You would be amazed by the amount of QA which happens on a driver, but there is always the chance that things behave differently after a driver change.

Sometimes fixes inside drivers even uncover problems inside applications. For example in a very recent case upgrading to a newer OptiX version and driver let an application fail because there was an uninitialized variable in the application’s device CUDA C code.

If everything else is the same and just updating the display driver from one which worked before (version number?) to the 358.50 drivers, that would need a reproducer to be investigated. Search for OPTIX_API_CAPTURE on this forum which explains how to produce an API trace.

Additionally at least these information would be needed:
OS version and bitness, installed GPU(s), display driver version (for regressions: which worked, which failed), OptiX version, CUDA toolkit version the app was built with.

Other than that, there have been a lot of changes in OptiX, CUDA toolkits, and drivers for Maxwell GPUs since the OptiX 3.6.3 version was released. I would definitely recommend to try the newest OptiX version available.
If exchanging the OptiX DLL against newer version works or if rebuilding the application with newer OptiX SDKs works, that would be your solution. If not, then a reproducer from the newer OptiX version has a better chance to be investigated.

I’ll try the OPTIX_API_CAPTURE to see if something can be fixed in my code, then I’ll update everything and release a new version. Thanks for the hints & quick answer!

So… I updated everything: CUDA 7.5, OptiX 3.8.0, CG 3.1.0013 and the driver to the latest version .
Then I rebuilt everything and added OPTIX_API_CAPTURE. The trace is readable, as far as I understand if res = 0 everything went ok and if not something was wrong. Did i get that right?

The call to validate() works and the program crashes at the compile() command. I tried simply removing it and let OptiX compile when I call rTrace, with not many changes.

I checked every single variable and buffer declared in my cuda code: all variables are declared and initialized on the host before validate() and compile();
also, there’s no cuda variable initialized on the host side that is not used in the device code.

The trace ends with:

rtContextValidate( 0000000012C1A830 )
res = 0
rtContextCompile( 0000000012C1A830 )
res = 1281
rtContextGetErrorString( 0000000012C1A830, 1281, 0000000000229BB8 )
res = 0

Any way to know, or investigate what res = 1281 means?

You’ll need to use CUDA 7.0, not CUDA 7.5. As is explained in the release notes, OptiX 3.8.0 supports only CUDA toolkits up to 7.0.

RTresult 1281 (or 0x501 in hexadecimal) is RT_ERROR_INVALID_VALUE. You can probably get more information on this by reading the error string returned by rtContextGetErrorString(). You can find the list of all RTresult values in optix_declarations.h.

Brilliant nljones.

I got my app working with OptiX 3.8.0 and found my exception catching code was not working with some specific exceptions… because of my extremely naive implementation.

CUDA 7.5 was the problem finally.
You should put a “Donate” link in your answers.