OptiX debugging with Nsight VSE

Hi,

I am sorry for bypassing the pinned topic, but I was wondering if I could get a sneak-peek into OptiX debugging with the latest Nsight VSE. That is, I built OptiX 6.5 SDK with Visual Studio 2017 and wanted to test the new debugging tool.

My question is, if I place a breakpoint somewhere inside an OptiX kernel, does that mean that running “Start CUDA debugging (Next-Gen)” should tell me the state of local variables declared in the kernel? Does it mean that compilation should be done with “-g” flag to generate the debug information? Thanks!

Hi, yes to all your questions. :) You should be able to use the latest Nsight VSE with OptiX as long as you’re on a 435 or newer driver. You can place breakpoints in one of your OptiX kernels and it should stop and show you the state of local variables as long as you compiled your kernels with debug info. If you don’t have debug symbols & line info compiled, you should still be able to break & inspect the device state, you’ll just step through the SASS (GPU machine code) instead of source. The debug info flag(s) you’d use depends on how you’re compiling your CUDA. Are you having trouble getting the debugger to work?


David.

Hi David, thank you for the timely response :) Yes, I’ve been having some troubles trying to get the debugger work.

I am using GeForce RTX 2060 with 436.15 driver, CUDA 10.1.243 and Optix 6.5.

I built Optix 6.5 SDK using Cmake 3.14.5; I checked “CURA_NVRTC_ENABLED” for JIT compilation and specified “-lineinfo” and “-G” flags in “CUDA_NVRTC_FLAGS”. Otherwise the CUDA compiler flags were left untouched, and with the addition of the above-mentioned flags:

-arch;compute_30;-use_fast_math;-lineinfo;-default-device;-rdc;true;-D__x86_64;-G

Then I ran one of the samples (“optixSphere”) in the Debug mode to inspect the state of CUDA compiler options - everything appears in order, as specified in the Cmake GUI. Finally, I received the following message in the console:

OptiX Error: 'Unknown error (Details: Function "_rtProgramCreateFromPTXString" caught exception:
 Compile Error: Unknown Value when trying to figure out pointer space for ray payload argument to rt_trace at: [ i64 %3 ])

If I don’t specify the debug flag “-G”, then the application runs and I can indeed break & inspect the device state, as you said. Namely, CUDA threads launch information, i.e. “threadIdx”, “gridDim” and the state of GPU registers. Although I am afraid I can not say that at the moment I am confident navigating this information, hence I was hoping to generate a friendlier debug output.

Oh, that’s a good report and should be easily reproducible. I have seen that error message before, I think it’s related to code inlining during compilation. I am investigating, thank you for the report! I’ll check myself, but do you get the same result debugging other samples?


David.

Thank you for your time! I tried a bunch of other samples, most of them give an identical error, with the exception of “optixHello.cpp”:

OptiX Error: 'Unknown error (Details: Function “_rtProgramCreateFromPTXString” caught exception: Compile Error: OptiX state access (rtBuffer) failed in function (_Z16draw_solid_colorv_ptx0x58556504797a14c8). at: [ Instruction: %59 = call i64 @_rt_buffer_get_64(i64 %val.i18, i32 2, i32 16, i64 %val.i17, i64 %val.i16, i64 %val.i15, i64 %val.i14), !dbg !34, contained in basic block: tmp13, in function: _Z16draw_solid_colorv_ptx0x58556504797a14c8, in module: Canonical__Z16draw_solid_colorv from (api input string) associated DI info (Dir/File/Line): C:/ProgramData/NVIDIA Corporation/OptiX SDK 6.5.0/include/internal optix_internal.h 420, LL file line and column: 420:5 ])

As far as I remember, those are the same type of errors I would receive back when I only started using Optix 5 with CUDA 9.2 and didn’t know I should not be compiling optix kernels with the debug “-G” flag. That made me think that there might be a rouge path (in CMAKE fields) pointing to CUDA v9.2 somewhere (I have two versions installed on the computer, also not sure if that’s relevant for the problem at hand), but I double checked, and there is no trace of CUDA v9.2 neither in the Cmake fields, nor in the environment variables of my OS (Windows).

For my first attempt, I installed a clean OptiX 6.5.0 and the newest driver (436.30), and I don’t get a CUDA compile error. It breaks in the CUDA code and I can see the launch details, but no local variables. I also had to start Visual Studio as Administrator in order to get profiling to work properly. It’s possible the compile error was fixed in the driver update. I’m also curious if you still have CUDA 9 on your system in addition to CUDA 10.1?


David.

Thank you for your time :) I updated the driver and tried running VS as Administrator, but to no avail (identical errors). I do have CUDA 9 on my system, but I couldn’t really find any trace of it in the project properties of built SDK samples, and “CUDA_PATH” env variable is pointing to CUDA 10 installation. I suppose I’ll try to remove CUDA 9, maybe reinstall CUDA 10, and see if it works out.

I was curious if the CUDA 9 executables or libraries are still sneaking into your regular PATH variable and causing trouble.

So far I couldn’t find it in my PATH variable, but I’ll double check everything and consult with my colleagues. I will let you know what was causing troubles once we hopefully figure it out.