Optix 6.0.0 fails with driver 425.25

Hi,

I am running an Optix application on Windows Server 2016, using 4 x TeslaP100 GPU’s, running Nvidia driver 425.25, but get the error “Failed to load OptiX library.”

On my dev-PC I got the same error before upgrading to a newer driver version, but upgrading on the server does not remove the error for me.

Anything else that I might be doing wrong…? Thanks in advance for any help! (On my dev PC I have CUDA 9.2 installed - not sure that affects anything…?)

nvidia-smi output:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 425.25 Driver Version: 425.25 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE… TCC | 00000000:2D:00.0 Off | 0 |
| N/A 31C P0 25W / 250W | 0MiB / 16298MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla P100-PCIE… TCC | 00000000:31:00.0 Off | 0 |
| N/A 33C P0 31W / 250W | 1169MiB / 16298MiB | 8% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla P100-PCIE… TCC | 00000000:A9:00.0 Off | 0 |
| N/A 33C P0 24W / 250W | 1036MiB / 16298MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla P100-PCIE… TCC | 00000000:B5:00.0 Off | 0 |
| N/A 33C P0 24W / 250W | 1036MiB / 16298MiB | 0% Default |
±------------------------------±---------------------±---------------------+

EDIT: Oops, I looked at the wrong OS.

Have you tried the official 419.69 drivers for your Tesla P100 from the www.nvidia.com driver download page?
The OptiX 6.0.0 core implementation resides inside the driver.
I would recommend using 430.xy driver versions when available.

Please check if the system installation contains an nvoptix.dll inside the system directories. If not, that’s the issue.

Here are some related posts which might be helpful (for Linux Users):
[url]https://devtalk.nvidia.com/default/topic/1047611/optix/optix-error-failed-to-load-optix-library/post/5316243/#5316243[/url]
[url]https://devtalk.nvidia.com/default/topic/1051577/optix/centos7-optix-error-a-supported-nvidia-gpu-could-not-be-found/post/5338170/#5338170[/url]
[url]https://devtalk.nvidia.com/default/topic/1050039/optix/optix-6-0-0-inside-nvidia-docker/post/5329213/#5329213[/url]

I first checked for nvoptix.dll, it is located in both the folders:
C:\Windows\System32\DriverStore\FileRepository\nv_dispswi.inf_amd64_b299c2f3f9b29d45
C:\Windows\System32\DriverStore\FileRepository\nv_dispwi.inf_amd64_b72a78695f0a2b0a

Are those locations correct?
On the desktop where this works (windows 10, driver 430.64) the nvoptix.dll is in:
C:\Windows\System32\DriverStore\FileRepository\nv_desktop_ref4i.inf_amd64_87fe2a70eb0ad3fe

I then reverted the driver to 419.69, and additionally found nvoptix.dll in:
C:\Windows\System32\DriverStore\FileRepository\nv_dispswi.inf_amd64_518b99f0278fcd26

In the device manager, under driver details I see a link to nvoptix.dll in that location.

Any other suggestions on how to debug this - I see a great speedup when changing to optix 6.0.0 on my tests on my Desktop card, and would really like to get that running on the server.

Unfortunately I have no experience with any Windows Server OS. I’ll ask around.

Are other CUDA applications working?
Is there any GPU virtualization used on the system?

There were cases where Linux drivers from distributions hadn’t picked up the newly added library and when reading the nvidia-smi dump my brain railed off to a Linux setup.

That the OptiX DLL is there indicates that the driver installation is ok.
That it appears multiple times inside the Windows driver store is an OS feature not deleting old drivers in case you need to revert.

You do not need a CUDA Toolkit to be installed to run OptiX applications. The CUDA driver version and the otpiX version in the display driver are what matters.
For OptiX 6.0.0 development it’s recommended to use CUDA 10.0.
Always read the OptiX Release Notes below the OptiX download button on the developer.nvidia.com site before setting up a devlopment environment.

I was thinking that the 425.25 diver might not be working, but if the 419.69 shows the same issue and is from the same major branch as the required 418.81 minimum version for OptiX 6.0.0 I can’t say why it’s not working.

The same application compiled for Optix5.1.1 works without problems.

No virtualization is being used.

I have tried uninstalling CUDA, and recompiled the code on the host against CUDA 10.1 - same problem, “Failed to load OptiX library”.

Would be great to hear if anyone has any suggestions on how to debug this on Windows Server.

I also tried the pre-compiled samples from the OptiX SDK, optixHello and optixConsole for example gives me the same error.

primeSimple.exe however executes successfully.

I am using the server through RDP, not sure if that makes any difference.

I have now run Process Monitor on the process optixConsole.exe, and see that on the Windows Server 2016 machine it tries to find the nvoptix.dll in C:\windows\system32, where it isn’t, but on the working desktop (Windows 10) it looks in the appropriate DriverStore folder and finds it.

I’m happy to share the process monitor logs in a pm, if you have any time to analyze it.

Windows Server 2016:
10:21:10.8484903 AM optixConsole.exe 18880 CreateFile C:\Windows\System32\nvoptix.dll NAME NOT FOUND Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a

Windows 10:
10:19:48.9627260 AM optixConsole.exe 9912 CreateFile C:\Windows\System32\DriverStore\FileRepository\nv_desktop_ref4i.inf_amd64_87fe2a70eb0ad3fe\nvoptix.dll SUCCESS Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened

Yes, all OptiX versions before 6.0.0 contain the OptiX core implementation inside the SDK’s optix..dll shipped with the applications. It moved into the display driver with OptiX 6.0.0 and the optix..dll is just a wrapper.

That’s the problem. Which with driver was that? Both 419.69 and 425.25?
If that system has only four Tesla P100 in TCC mode which vendor’s display driver is running when accessing it bare metal and when running via RDP?
I’ll file a bug for analysis.

I copied nvoptix.dll and nvrtum64.dll to c:\windows\system32\ and now my application works.

I ran the last test with driver 425.31 - I will roll back to 425.25.

I dont have physical access (the datacenter is a few hundred miles away…), the RDP login shows Microsoft Basic Display Adapter as the fifth display adapter.

Ok, thanks.
We’ll check what can be done about the OptiX DLL search location.

This is just a report.
I met the same issue and the same solution at the comment 9 works.

Environment:
AWS EC2 p3.2xlarge instance
Windows Server 2016 Base.
426.00 for Tesla V100

Thank you for investigating mbglo4q!