OptiX 5.1.1 precompiled samples on GeForce RTX 3060 do not work

Good morning, I’ve changed computer and I’m having problems porting my program to Optix 6.5 as the selectors have been removed and I’m having problems with shadow calculation.

I wanted to try with Optix 5.1 but if I use the NVidia examples they work on a GTX 1660 Super (desktop) but not on a GeForce RTX 3060 Laptop.

If I run “optixDeviceQuery” (Optix 5.1.1) I get

`OptiX 5.1.1
Number of Devices = 1

Device 0 (0000:01:00.0): NVIDIA GeForce RTX 3060 Laptop GPU
Compute Support: 8 6
Total Memory: 6442450944 bytes
Clock Rate: 1425000 kilohertz
Max. Threads per Block: 1024
SM Count: 30
Execution Timeout Enabled: 1
Max. HW Texture Count: 1048576
TCC driver enabled: 0
CUDA Device Ordinal: 0

Constructing a context…
Created with 1 device(s)
Supports 2147483647 simultaneous textures
Free memory:
Device 0: 5407899648 bytes`

while instead if I run “optixConsole” I get

OptiX error: Unknown error (Details: Function "_rtBufferCreate" caught exception: Encountered a rtcore error: m_exports->rtcDeviceContextCreateForCUDA( context, devctx ) returned (2): Invalid device context)

Do I have any hope of using Optix on my new PC? (I don’t want to port the APP to Optix 7.x it would be too expensive…)


OptiX 5 is not supported on Ampere GPUs. It’s too old for that.
The OptiX core implementation moved into the display driver with OptiX 6 to handle that going forward.
Answered before here: https://forums.developer.nvidia.com/t/optix-5-1-1-crashes-on-a100-gpu-on-attempt-to-create-buffer/178530

I have some old code with Optix5.1. We upgraded some systems to RTXs. I do plan to migrate to Optix6.x/7.x in near future. Meanwhile is there a way to programmatically find if Optix5.1 is not compatible on current device, so I can error appropriately instead of crashing?
Thank you

I believe you should be able to call rtContextCreate() and check whether it returns RT_SUCCESS. Otherwise, if it returns RT_ERROR_NO_DEVICE, then you could stop and display an error message to the user.


Hi David

Optix5.1 was fine on our GTXs. But terminates on RTXs. rtpContextCreate was successful, but crashes on rtpModelUpdate.
Optix6.5 works fine, but before we migrate to 6.5 or above, I just need to handle this nicely.

Here is the crash I get running Optix5.1’s sample primeSimple on RTX

Using cuda context
Error at <</root/sw/wsapps/raytracing/rtsdk/rel5.1/samples_sdk/primeSimple/primeSimple.cpp(158): Function “RTPresult _rtpModelUpdate(RTPmodel, unsigned int)” caught exception: Encountered a CUDA error: radix_sort_temp_size → cub::DeviceRadixSort::SortPairs returned (8): invalid device function’ (999)

Unfortunately the bad news is that OptiX Prime is not compatible with Ampere GPUs on recent display drivers even in the OptiX SDK 6.5.0 release.

That API has been removed from OptiX 7 SDKs because is does not make use of the RTX ray tracing hardware units.

The only feasible future proof solution would be to port the OptiX Prime application over to the OptiX 7 API and that should actually not be too difficult because of the limited features the OptiX Prime API offered.
The OptiX SDKs contain an example named optixRaycasting for quite some time now (even in OptiX 6 versions) which demonstrates the below things.

OptiX Prime applications would only handle the ray-triangle intersection part with it, in an also limited acceleration structure hierarchy (one instance level over triangle geometry). Everything around that, means ray generation and shading calculations, would happen outside of that, usually in native CUDA kernels. That part can be completely reused when just implementing the ray-triangle intersection with OptiX 7 instead.

OptiX Prime only supported a completely flat hierarchy (triangle geometry only) or a single-level hierarchy (instances of triangle geometry) which are fully hardware accelerated cases in OptiX 7 on RTX boards (e.g. look for OPTIX_TRAVERSABLE_GRAPH_FLAG_ALLOW_SINGLE_LEVEL_INSTANCING) inside the OptiX 7 docs.

Building the same kind of acceleration structure and defining geometric primitives would need to be changed inside the host code. On the device side, the ray generation program takes your ray query data and shoots the rays. There would only need to be one closest hit program because all that does is returning hit results from triangles. A miss program wouldn’t be needed since that could be covered by the default initialization of the hit result (negative t_hit to indicate miss).

The benefit of using the OptiX 7 API would be full RTX hardware acceleration of the BVH traversal and ray-triangle intersection, and additionally you could handle other primitive types, have a more flexible scene hierarchy, fully custom ray query and hit result data, and some more options.

Once that is working, using the whole ray pipeline by also moving the ray generation and shading calculations into OptiX 7 device code would allow to increase the performance even more.

Meanwhile is there a way to programmatically find if Optix5.1 is not compatible on current device, so I can error appropriately instead of crashing?

You could query the CUDA device properties with the CUDA Runtime API resp. device attributes with the CUDA driver API for the streaming multi-processor version and reject too new architectures.

Thank you. Pretty detailed explanation. I am looking forward to migrate to Optix7.x.

Meanwhile to check if I can run Optix5.1 on a current machine. The 2 links you provided leads to same computeCapability majors on our GTX and RTX systems. Should I not be checking what is the max computeCapability that an Optix supports and compare again the device’s? Is that possible or am I missing the point.

The issue is that OptiX doesn’t really specify a maximum compute capability which it supports, only a minimum which is Maxwell GPUs since OptiX 6.0.0.
Each OptiX SDK version’s release notes list the supported GPU architectures at the time it was released and that doesn’t include the ones shipped later.
That OptiX Prime stopped working on Ampere was actually unexpected and due to discontinued support of some specific CUDA instructions still used inside these old acceleration structure builder kernels. That code is part of that old SDK, so there was also no way to solve this with a driver update like it’s possible with the higher-level OptiX API implementation living inside the display drivers since OptiX 6. Note that OptiX 7 is a header-only API.

Anyway, I thought you were interested in rejecting specific GPU devices before doing anything with OptiX and when using OptiX Prime you would most likely have a CUDA initialization code part inside your application already after which you could call either of these queries to determine the GPU architecture and if it’s Ampere or higher, exit gracefully instead of crashing on the incompatible OptiX Prime acceleration structure builder.

Here is a nice list of what GPU has what compute capability: https://en.wikipedia.org/wiki/CUDA

The 2 links you provided leads to same computeCapability majors on our GTX and RTX systems.

For which GPUs?
I meant to use both the major and minor compute capability to distinguish the GPUs.
Your GTX 1660 should be SM 7.5 (Turing) and an RTX 3060 should be SM 8.6 (Ampere).
Is that not the case and the one with SM 7.x is also not working?

Got it. You are right, I just need to reject running Optix5.1 on specific devices for time being, I could just do deviceComputeCapabilityMajor < 7.
So instead of hardcoding, I was wondering if there was a suitable variable, so I could do
deviceComputeCapabilityMajor < optix.maxComputeCapabilityMajor.

But I have what I need and thanks for all the help. I look forward to Optix7.