API for BVH Traversal on Turing GPUs


I’ve got a handful of scientific computing apps that run on CUDA, and a big performance bottleneck for these codes has been geometry search (e.g. locating parts of large meshes that are in contact). I’ve been using a BVH implementation based on the blog posts by Tero Karras:

This has been a huge win for performance so far, and I was excited to see that the new Turing cards have dedicated hardware for BVH traversal, which is by far the most expensive part of the search process for me.

Do we know if the API for accessing this BVH traversal hardware will be made public, or will it only be accessible through NVIDIA’s ray tracing libraries?



You could attempt to get access to a CUDA 10 early access version to see if such an API is provided


Once you have access rights in your developer account, the early access information can be found here https://developer.nvidia.com/cuda-early-access

The early access program is by invitation only, I do not know if the above link allows applying for access if you don’t already have permissions.

I also find this legal blurb on said web page: "Access to the EA program is restricted. So please remember that by participating in this program, you agree not to share any binaries or documentation and you must not discuss this program or any details of the features, functionality or performance on any public forums or with other developers. You may discuss any technical issues with your NVIDIA Contact or use the bug submission form below for technical questions, feedback, bugs, issues and feature requests.

Being part of the EA program gives you the unique opportunity to directly influence the quality and performance of the final production version of the CUDA Toolkit. You are encouraged to try out the new features and provide us feedback. We expect that you will compile your current applications with the new toolkit and report any functional and performance changes including improvements and any regressions to us."

This pretty much says that you cannot publicly disclose your findings about CUDA 10 yet if obtained from/through the Early Access software.

Beyond this, the early access terms of use should be studied carefully

there is an even longer version (PDF) here:



It would appear that currently the only public APIs to make use of RTX will be

DirectX Raytracing (DXR) http://forums.directxtech.com/index.php?topic=5860.0
nVidia Optix https://devblogs.nvidia.com/nvidia-optix-ray-tracing-powered-rtx/
nVidia GameWorks https://developer.nvidia.com/gameworks-ray-tracing

note that there is Early Access for some of this (GameWorks)

for Microsoft DXR one has to wait until the next feature upgrade to Windows 10. Maybe some fast ring or beta channels offer this already.

I’m disappointed to hear that we probably won’t have a way to access to the RT Core hardware directly. Ray tracing is useful, but BVH traversal is the more general tool here, and it’s a shame to let that dedicated hardware go unused!

Is there any way (other than these forums) to figure out more information about whether this might be supported in future CUDA releases?

I am sure someone will figure out the Turing instruction set in no time.

Looking at you, scottgray ;-)

For future cuda releases, try get into the Early Access programs for the CUDA toolkits.
But you can’t publicly disclose any information you obtained from EA code - it might still be useful though.

Thanks you for the input, cbuchner. I have applied to the early access program (a month ago) and it is still pending-- so it seems that probably won’t work out, unfortunately.

Any updates?

same question, any updates?

Unfortunately, I didn’t find any way to access the RT cores from CUDA directly.

currently, the recommended approach to access RTX core functionality is via Optix.

currently, there is no direct access method from CUDA

CUDA and Optix can interoperate:

I want to get 2070 to play with RT capabilities from CUDA, but seems it is not possible. Seems I should to choose something different. 1660? But it is step backward.
I suspect optix is strongly proprietary and license is much more expensive then 2080ti.

From published reviews, the GTX 1660 Ti does not include RT capabilities. It is targeted at a lower price point, which requires a smaller die. It is basically a faster successor to the GTX 1060 with roughly the performance of a GTX 1070.

On multiple sites, I see people commenting negatively on the lack of a huge performance boost between recent GPU generations. News flash: Moore’s Law is over, and therefore the future will bring us relatively small incremental improvements, for GPUs, CPUs, and all things electronic. As a corollary, deflationary pricing of semiconductor products has come to an end as well: expect to pay more for more capable hardware. While there will still be some improvements to transistor speed going forward, transistors will not get cheaper and might even get more expensive because of the added steps in the complicated manufacturing process.

I know, that GTX 1660 Ti does not include RT capabilities and I make an accent on this intentionally. I want to investigate new hardware features of RTX 20**. Currently I can’t do it with CUDA (API of my choise).
I know, that rasterizer also not feaseble from CUDA API, but it not such a problem.

As the Rolling Stones sang: You can’t always get what you want. You could file feature requests with NVIDIA through the bug reporting mechanism.

Personally, I think NVIDIA should resist the temptation of trying to expose any and all hardware capabilities through CUDA. I would not want to see CUDA turn from a tidy general-purpose C++ programming environment into an ever-expanding messy hodgepodge of special hardware support (any more than it already has become).

I think support of RT cores by no means more messy then already introduced support of tensor cores.
I know I can use matrix operations (i.e. tensor cores) for speedup ray-triangle intersections, but don’t want to do this in such a way, whereas I can use specialized hardware directly.
Special hardware support is not the thing requires your attention at all. In C++ you don’t pay for things you don’t use. Just omit corresponding header and you won’t even get the pollution of any namespace.

There is cost in complexity and NVIDIA’s resources aren’t limitless. Opportunity cost is a thing. I think domain-specific languages could be a possible alternative here. Whether OptiX specifically is the right environment for pursuing RT-based solutions I cannot say, I haven’t used it.

I spent enough time in the development of x86 processors to see the drawbacks of instruction sets growing to hundreds (now: thousands) of instructions. And I have seen the C++ specification balloon to 1200 pages in an everything-plus-the-kitchensink approach. I certainly have lost the overview.

I understand that we are probably at a fundamental philosophical disagreement here.

I understand your point of view.
In the discussed case I think NVIDIA just resists own software assets (OptiX). They do not want to depreciate the years of development. If everyone start to release competitive ray-tracers, then OptiX become less profitable then currently for sure.
Also there is no costs in complexity. Because discussed hardware already exposed somehow in OptiX library.

I don’t claim to know how these decisions are motivated, but it certainly seems possible. Given NVIDIA’s high level of spending on R&D relative to industry averages, at up to 20% of revenue in recent years, it would seem understandable and fair that they’d like to see return on that investment.

Based on past experience in the semiconductor industry I would say that spending 20% on R&D is usually not a long-term sustainable level of investment, so I would expect this percentage to drop a couple years out.

OptiX 6 is free to use within any application, including commercial and educational applications.

Hope it will be enought for me.