[bugreport & fix] Optix 7 Corrupts CUdeviceptr in the SBT due to truncation [Hardcore]

devsh · March 5, 2020, 9:37pm

TL;DR If you happen to have a CUdeviceptr that actually needs 64bits and stick that in the SBT as the base for any of the entries, your optixLaunch will fail spectacularly (with a CUDA_ERROR_ILLEGAL_ADDRESS).

So to start this off…

Let’s note that on the “Device Side” (actual CUDA code) the value of sizeof(CUdeviceptr) is 4 while alignment is 8, unless I’m using NVRTC wrong somehow (I don’t see an explicit 64bit option).

I added some debug to my program and I saw that my CUdeviceptr inside the SBT on the Host side are using some upper 32bits of the storage.

Here are my CUdeviceptrs for the raygen, miss and hit pointers in the SBT, as you can see the 9th hex digit is 2.
0000000203800000 0000000203A00000 0000000203C00000

And here’s the result of printing the pointer
RayGenData* rtData = (RayGenData*)optixGetSbtDataPointer();

GPU 0000000003800020

As you can see the 2 in the front got truncated.

I had a look at the implementation of optixGetSbtDataPointer and it does a unsigned long long to CUdeviceptr cast.

Now I’d guess the examples work because they’re not using OpenGL interop, nor the CUDA Driver API, nor JIT compilation (unlike me where OpenGL is the owner of all buffers and images), and you are unlikely to get a pointer outside the range of 32bits with cudaMalloc or cuMemAlloc.

But cuGraphicsResourceGetMappedPointer does return some pretty high values.

I’ve made a tag with a reproducable example from my engine

as well as a binary build (unfortunately to run it requires your OptiX CUDA headers to be present under “C:/ProgramData/NVIDIA Corporation/OptiX SDK 7.0.0/include”, cause I’m jit-ing the raytrace programs)

This simple workaround fixes everything:
https://github.com/buildaworldnet/IrrlichtBAW/commit/64a38519a9001f611e11c10c353a31b792da8edf

devsh · March 5, 2020, 9:48pm

oh yeah, before anyone asks my PTX begins with

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-27506705
// Cuda compilation tools, release 10.2, V10.2.89
// Based on LLVM 3.4svn
//

.version 6.5
.target sm_75
.address_size 64

so its not 32bit or something like that.

I’m using jitify.hpp but nowhere does it seem to typedef CUdeviceptr to something else than the CUDA SDK.

I’m on CUDA SDK 10.2

dhart · March 5, 2020, 9:57pm

Hi devsh, thank you for the report. You might want to compare your solution to this recent thread. Others found that their default version of stddef.h was causing the problem. https://devtalk.nvidia.com/default/topic/1070522/optix/raygen-sbt-data-access-results-to-illegal-memory-access/

It would probably help to report on which versions of Visual Studio and Cmake you have. If you are using VS2019, be aware the cmake setup doesn’t work out of the box. And even with VS2017 and the most recent Cmake, I have to specify a 64-bit toolchain manually.

–
David.

droettger · March 6, 2020, 8:59am

For examples using the CUDA Driver API and OpenGL interop, please have a look into the OptiX 7 examples linked here:
https://devtalk.nvidia.com/default/topic/998546/optix/optix-advanced-samples-on-github/

Make sure to use a lower target than sm_75 during the compilation if you’re planning to support not only RTX boards.

devsh · March 7, 2020, 7:08pm

Since I’m doing NVRTC I just figured I’ll compile for the highest compute virtual arch compatible with the weakest device found on the system. If there is no Turing board I’ll keep on dropping the NVRTC option arch all the way down until sm_30.

I’m always on the 64bit toolchain.

It’s now fixed, I did what jitify (and the post David linked) people told me to do

Now I simply define _WIN64 or __LP64__ for NVRTC when compiling my optix programs.

P.S. I do however, now have a different problem.