Vector functions for OptiX 7 with NVRTC?

Hi,

In OptiX 6, we used optixu_math for vector operations, and it was convenient with NVRTC.
Are there / will there be alternatives for OptiX 7?

Thanks

I learnt the real problem is from here:
https://devtalk.nvidia.com/default/topic/1060884/optix/optix-7-samples-using-nvrtc/

Apparently, vec_math.h of OptiX 7 is not friendly with NVRTC as use of cmath and cstdlib…

Well, it’s quite late, but could help someone who hit this thread.

vec_math.h is not friendly with NVRTC because of cmath, that is fully host (which is quite strictly specified in NVRTC documentation, that host code is not allowed), and more than that on MSVS couldn’t be found (as it is placed in Windows Kit), don’t know what happens on GCC.
I placed all the functions in block #ifdef CUDACC /#endif and replaced and with <cuda_runtime.h>: there is nested math functions that works only for device, and as far as I know all cmath functions that are used in vec_math.h exists as device functions at CUDA.

Though I really don’t understand why OptiX samples was configured for using NVRTC, but, it seems, never tested with it.

My problem for now is that, even after configuring shaders for compiling with NVRTC, I still can’t debug/hit breakpoint for my ptx-code :(

1 Like

1.) Have you enabled --generate-line-info (-lineinfo) inside the NVRTC options?
2.) I’m not actually sure if OptiX 7 still doesn’t handle PTX code compiled with debug flags --device-debug (-G). I never set that.
3.) Have you used the proper OptixCompileOptimizationLevel and OptixCompileDebugLevel values inside the OptixModuleCompileOptions and OptixPipelineLinkOptions?
Something like this:

OptixModuleCompileOptions mco;
  memset(&mco, 0, sizeof(OptixModuleCompileOptions) );

  mco.maxRegisterCount  = 0;                                  // No explicit limit.
#if USE_DEBUG_OPTIONS
  mco.optLevel          = OPTIX_COMPILE_OPTIMIZATION_LEVEL_0; // No optimizations.
  mco.debugLevel        = OPTIX_COMPILE_DEBUG_LEVEL_FULL;     // Full debug.
#else
  mco.optLevel          = OPTIX_COMPILE_OPTIMIZATION_LEVEL_3; // All optimizations, is the default.
  mco.debugLevel        = OPTIX_COMPILE_DEBUG_LEVEL_LINEINFO; // For profiling. Otherwise OPTIX_COMPILE_DEBUG_LEVEL_NONE;
#endif

  ...

  OptixPipelineLinkOptions plo;
  memset(&plo, 0, sizeof(OptixPipelineLinkOptions) );

  plo.maxTraceDepth = 2;
#if USE_DEBUG_OPTIONS
  plo.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_FULL;
#else
  plo.debugLevel = OPTIX_COMPILE_DEBUG_LEVEL_LINEINFO; // For profiling. Otherwise OPTIX_COMPILE_DEBUG_LEVEL_NONE;
#endif
  plo.overrideUsesMotionBlur = 0;
  ...

With Nsight Compute 2019.4 that should allow at least OptiX 7 kernel profiling with CUDA code and SASS code view.
I have not used any debug stepping with that, but heard from a customer that it worked with Nsight Visual Studio Edition, just not very fast.

Hi, Detlef!

  1. Actually, this option was enabled by default and I left it.
  2. Tried with both this option on and off.
  3. Well, thank you for reminding about this options! I thought, I already made them fully debug… Though, even with these changes nothing happens :(

I have Nsight 2019.4, Cuda 10.1, Optix 7.0. I’m starting my MSVS project with “Start CUDA debugging (Next-Gen)” with breakpoint at the several places in raygen program and skip everything until optixLaunch(…) function.
I tried to enable/disable show of disassembly in debugging options of MSVS.
Am I doing everything right?

I even tried to start program directly from Nsight Compute App, but it doesn’t help as I barely understand what is going on with device-functions callstack only.

dhart mentioned in pinned message something about making video about using debugger, and it was in August. Don’t you know, should we expect something? Because for now my researches about this problem led me to nowhere.

Thank you for your help, I really appreciate it!

As I understand, the video David mentioned is the one of his session on SIGGRAPH 2019.
I’ll look into it, might be helpful for those looking for answers too:
https://developer.nvidia.com/siggraph/2019/video/sig915

Hey tokareuv,

We did shoot a video at last Siggraph of using the debugger specifically, and that’s what I was referring to in my earlier post. But unfortunately, I found out later, it turned out the audio track was unusable, super noisy and you can’t hear the speaker. We are discussing re-shooting the video, but I don’t know if or when it will happen. I’m sorry! We had (have) every intention of getting video out. We are happy to answer questions here about how to use the profiler & debugger in the mean time. Is the current problem that Nsight Compute won’t break on your breakpoints? What GPU are you using when running Nsight Compute?

–
David.

Nice to see you, David! :)
Well, yes. As I had mentioned before, I’ve set up NVRTC shader compiling so it works, and set full-debug level of optimizing both on pipeline and module compile options. Right now I’ve updated CUDA to 10.2, though I don’t think that it will make some drastic changes.

GPU is Geforce GTX 1050, so it is not RTX, but as far as I understood it’s no difference for NSight debugging?

UPD: After CUDA update (from 10.1 to 10.2) suddenly breakpoints hits are possible. Either it was some error of NSight installation for VS that was fixed, or I don’t even know what is happening.

Well, the real question is, is it possible to watch local variables with Next-Gen CUDA debugger, and if I could fixate the thread I want to debug (meaning, the pixel by coordinate)?
Well, of course I can leave lot of prints and if statement for skipping undesirable pixels, though it would be a lot helpful if we can check it somehow during debug. :)

Thank you and wait for your response!

UPD2: Found Warp Info/Lanes/GPU Registers windows. While now it is a little bit clearer how to debug on different threads, it is still not clear how to understand which thread relies on which pixel, but I think that is an information I can handle (still, would appreciate any tips or links I might have missed!), and watching local variables is still under question.

Hi, yes GTX 1050 is fine, there is no difference for Nsight debugging GTX vs RTX.

By “watch” I assume you mean setting a breakpoint that is conditioned on a variable being equal to some value you choose?

I haven’t been able to try this today in OptiX, but the Nsight VSE manual says you can set conditional breakpoints. If this works in OptiX programs, you will probably have to create a local variable with your pixel coordinates (via optixGetLaunchIndex) in order to get this to break on a pixel of your choice.

https://docs.nvidia.com/nsight-visual-studio-edition/Nsight_Visual_Studio_Edition_User_Guide.htm#Set_GPU_Breakpoints.htm

See the section “Conditional Breakpoints”

If for some reason that doesn’t work for you right now, one way to get similar behavior is to add local variables with the value from optixGetLaunchIndex(), and an if block that checks for when the launch index matches the pixel you’re interested in. Then you can place an unconditional breakpoint inside the block and catch the pixel you want in the debugger. The debugger will stop in the specific thread you’re interested in, but you can also examine the other threads in the warp, if you want.

I always like to add some debug pixel coordinates to my launch parameters and a boolean flag. That way, I can set the flag and the coordinates in my mouse callback by clicking on the screen, in order to add debug prints or breakpoints into my OptiX shaders. A macro like below comes in pretty handy for debug prints, it might be tweakable to trigger code with a breakpoint:

#define print_pixel(...)                                                                     \
{                                                                                            \
    const uint3  idx__ = optixGetLaunchIndex();                                              \
    if( params.debug && idx__.x == params.debug_pixel.x && idx__.y == params.debug_pixel.y ) \
        printf( __VA_ARGS__ );                                                               \
}

–
David.

Thank you, David, for your answer!
By the “watch” I understand local variable which value is accessible without print and can be watched for the changes during line-by-line code execution. Though then I found from documentation that it seems it was possible in legacy debugger but isn’t now, so no way to use watches in this exact way. But I think I can find my way around with some macro or any other way, to quickly print all desired values on demand :)

Well, most of these I have achieved already, but it’s good to know that it’s the only way (so I won’t look for another solution).
As for the conditional breakpoints, well, I tried it once to stop on exact launch index before, and it failed, but maybe I messed somewhere and still can achieve this.

For the “print_pixel” macro, something like this came to my mind, though I failed to get an idea of using it with LaunchParams, nice to move it to the host, thank you :)

It seems, that for now I received all answers I needed about debug, thank you very much!
Hope also that this thread will help someone else not to struggle all way along different documentations.