I am trying to use Nvidia Tesla K80 GPU with Ubuntu 18.04 instance on google cloud.
- Installed Cuda 11,
- Nvidia driver [cuda-drivers_450.51.06-1_amd64.deb]
- After reboot conffirmed that nvidia-smi output showed that driver is installed.
- Checked that nauveu driver is blacklisted etc.
- Using Optix 6.5 SDK compiled example code optixDeviceQuery
When I run with strace I get this error trace. What am I doing wrong?? Any pointers to fix this is much appreciated.
======================================
write(2, "A supported NVIDIA GPU could not"…, 105A supported NVIDIA GPU could not be found
(/home/ravi/optix/SDK/optixDeviceQuery/optixDeviceQuery.cpp:55)) = 105
write(2, “'\n”, 2’
) = 2
When I run with strace I get this error trace. What am I doing wrong?? Any pointers to fix this is much appreciated.
You’re using an unsupported combination of GPU architectures and OptiX versions.
Please always refer to the OptiX release notes directly under the resp. OptiX download button on developer.nvidia.com for supported GPUs and required minimum driver versions.
Find a link to older OptiX versions at the bottom of this page: https://developer.nvidia.com/designworks/optix/download
Copying my answer from here: https://forums.developer.nvidia.com/t/optix-error-failed-to-load-optix-library/70671/27
- First, an NVIDIA Tesla K80 is a Kepler based GPU. That GPU architecture is not supported by OptiX since version 6.0.0. Means your system setup is limited to OptiX 5.1 and older versions.
- Versions 6.0.0 and before would not have this issue with driver components because the OptiX core implementation is not residing inside the display drivers. See posts above.
- If you had a Maxwell or newer GPU, OptiX 6.5.0 or newer is not supported by 418 drivers.
- (A similar issue under Windows 10 based OSes with all GPUs in the TCC driver mode where not all required driver components could be found with OptiX 6.5.0 and higher has been fixed inside the 450 driver releases. See versions in the OptiX 7.1.0 Release Notes.)
Thank you for your prompt help. All this time I was under the impression that it is the Nvidia driver that was the issue. While I was aware that Optix higher versions would be required for latest cards, did not know that you can’t use the higher version Optix for older cards. I wish there was compatibility table up front. (Cuda/Optix/Driver)
Anyway google cloud offerings seem to be limited to Tesla gpus (T4, P100, V100, P4). I am not sure if any of these can be used with Optix 6.5/6 and Cuda 11 with the Nvidia 450 driver.
I am trying to save my compute expenditure as much as I can without refactoring the code.
I’ve not used any cloud systems myself, but all of the NVIDIA Tesla products you listed now are supported by OptiX 6.0.0 and up.
You can find the underlying GPU architectures in this table: https://en.wikipedia.org/wiki/Nvidia_Tesla
The display driver versions on the remote systems would be the limiting factor. For OptiX 7.1.0 you strictly require release 450 drivers and also for Windows 10 based OSes with all GPUs in TCC mode which is the case with pure Tesla systems and no other NVIDIA GPU installed.
There might be compatibility issues with PTX code generated by CUDA 11 on the older of these OptiX versions.
CUDA 10.x should work without issues if you compile against SM 5.0 (Maxwell).
I wish there was compatibility table up front. (Cuda/Optix/Driver)
Again, always refer to the OptiX release notes for the recommended drivers and CUDA toolkits per OptiX version.
That is listed under System Requirements on the very first page there.
Thank you for your kind response. I have now moved to P4 GPU (Pascal arch). optixDeviceQuery works :)).
But example with optixConsole fails
./optixConsole
OptiX error: NVRTC Compilation failed.
nvrtc: error: invalid value for --gpu-architecture (-arch)
Is this because I am using Cuda 11 instead of Cuda 10.x ??
Yes, that could be a problem. Which --gpu-architecture did it use?
CUDA 11 removed support for SM 3.0 (early Kepler) and deprecated all SM versions up to and including 5.0 (Maxwell).
If it tried to compile for SM 3.0 that should fail on CUDA 11.
Note that you can disable the use of the CUDA runtime compiler NVRTC for the SDK examples globally by deselecting the CUDA_NVRTC_ENABLED
switch inside the CMake GUI.
Then the CUDA source code will be pre-compiled using the NVCC compiler.
The OptiX 7.1.0 SDK examples use SM 6.0 (Pascal) by default to not get the CUDA 11 deprecation warnings.
If you want to run the examples on Maxwell, you’d need to change the target SM version down to 5.0 again.
https://forums.developer.nvidia.com/t/optix-7-1-issue-with-running-samples-on-a-maxwell-card/140118/2
Note that the built SDK examples will only work at their original location unless there are two environment variables set.
Look at the functions getSampleDir() and samplePTXFilePath().
Here’s why: https://forums.developer.nvidia.com/t/sdk-samples-sutil-getptxstring-file-path/70963/2
Thank you for the detailed help. I finally managed to get to run the examples thanks to you.
What is the best way to turn off just in time compilation. Just commenting the line in config file sufficient?. I am not too famliar with cmake. I don’t use GUI.
// Signal whether to use NVRTC or not
#cmakedefine01 CUDA_NVRTC_ENABLED
// NVRTC compiler options
#define CUDA_NVRTC_OPTIONS @CUDA_NVRTC_OPTIONS@
Just search for CUDA_NVRTC_ENABLED inside the provided OptiX SDK CMakeList.txt files, not inside your build folder which already generated the solution with that setting.
Find this CMake set() instruction:
# Select whether to use NVRTC or NVCC to generate PTX
set(CUDA_NVRTC_ENABLED ON CACHE BOOL "Use NVRTC to compile PTX at run-time instead of NVCC at build-time")
and change the default value from ON
to OFF
.
Then delete the CMake cache and generate the solution again.