I am running an Optix application on Windows Server 2016, using 4 x TeslaP100 GPU’s, running Nvidia driver 425.25, but get the error “Failed to load OptiX library.”
On my dev-PC I got the same error before upgrading to a newer driver version, but upgrading on the server does not remove the error for me.
Anything else that I might be doing wrong…? Thanks in advance for any help! (On my dev PC I have CUDA 9.2 installed - not sure that affects anything…?)
Have you tried the official 419.69 drivers for your Tesla P100 from the www.nvidia.com driver download page?
The OptiX 6.0.0 core implementation resides inside the driver.
I would recommend using 430.xy driver versions when available.
Please check if the system installation contains an nvoptix.dll inside the system directories. If not, that’s the issue.
I first checked for nvoptix.dll, it is located in both the folders:
C:\Windows\System32\DriverStore\FileRepository\nv_dispswi.inf_amd64_b299c2f3f9b29d45
C:\Windows\System32\DriverStore\FileRepository\nv_dispwi.inf_amd64_b72a78695f0a2b0a
Are those locations correct?
On the desktop where this works (windows 10, driver 430.64) the nvoptix.dll is in:
C:\Windows\System32\DriverStore\FileRepository\nv_desktop_ref4i.inf_amd64_87fe2a70eb0ad3fe
I then reverted the driver to 419.69, and additionally found nvoptix.dll in:
C:\Windows\System32\DriverStore\FileRepository\nv_dispswi.inf_amd64_518b99f0278fcd26
In the device manager, under driver details I see a link to nvoptix.dll in that location.
Any other suggestions on how to debug this - I see a great speedup when changing to optix 6.0.0 on my tests on my Desktop card, and would really like to get that running on the server.
Unfortunately I have no experience with any Windows Server OS. I’ll ask around.
Are other CUDA applications working?
Is there any GPU virtualization used on the system?
There were cases where Linux drivers from distributions hadn’t picked up the newly added library and when reading the nvidia-smi dump my brain railed off to a Linux setup.
That the OptiX DLL is there indicates that the driver installation is ok.
That it appears multiple times inside the Windows driver store is an OS feature not deleting old drivers in case you need to revert.
You do not need a CUDA Toolkit to be installed to run OptiX applications. The CUDA driver version and the otpiX version in the display driver are what matters.
For OptiX 6.0.0 development it’s recommended to use CUDA 10.0.
Always read the OptiX Release Notes below the OptiX download button on the developer.nvidia.com site before setting up a devlopment environment.
I was thinking that the 425.25 diver might not be working, but if the 419.69 shows the same issue and is from the same major branch as the required 418.81 minimum version for OptiX 6.0.0 I can’t say why it’s not working.
I have now run Process Monitor on the process optixConsole.exe, and see that on the Windows Server 2016 machine it tries to find the nvoptix.dll in C:\windows\system32, where it isn’t, but on the working desktop (Windows 10) it looks in the appropriate DriverStore folder and finds it.
I’m happy to share the process monitor logs in a pm, if you have any time to analyze it.
Windows Server 2016:
10:21:10.8484903 AM optixConsole.exe 18880 CreateFile C:\Windows\System32\nvoptix.dll NAME NOT FOUND Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
Windows 10:
10:19:48.9627260 AM optixConsole.exe 9912 CreateFile C:\Windows\System32\DriverStore\FileRepository\nv_desktop_ref4i.inf_amd64_87fe2a70eb0ad3fe\nvoptix.dll SUCCESS Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened
Yes, all OptiX versions before 6.0.0 contain the OptiX core implementation inside the SDK’s optix..dll shipped with the applications. It moved into the display driver with OptiX 6.0.0 and the optix..dll is just a wrapper.
That’s the problem. Which with driver was that? Both 419.69 and 425.25?
If that system has only four Tesla P100 in TCC mode which vendor’s display driver is running when accessing it bare metal and when running via RDP?
I’ll file a bug for analysis.
I copied nvoptix.dll and nvrtum64.dll to c:\windows\system32\ and now my application works.
I ran the last test with driver 425.31 - I will roll back to 425.25.
I dont have physical access (the datacenter is a few hundred miles away…), the RDP login shows Microsoft Basic Display Adapter as the fifth display adapter.