Latest NVIDIA OptiX Renders Ray Tracing Faster Than Ever Before

Originally published at: Latest NVIDIA OptiX Renders Ray Tracing Faster Than Ever Before | NVIDIA Technical Blog

NVIDIA OptiX 7.4 introduces parallel compilation, temporal denoising of arbitrary values, improvements to the demand loading library, enhancements to ray payloads, Catmull-Rom curves, and decreased memory for curves.

My only interest is VR - I no longer play games in flat (or pancake) mode, so whenever Nvidia announces something could you please include a mention to tell us whether this is going to improve the VR experience. I have a 2080TI which I bought especially for VR and it works well, I would have upgraded to one in the 30 series (if they had been available) but from what I have been able to work out it would not have given a significant improvement.

Thank you.

1 Like

hello, thank you for share, but i don’t have idea how get, install and use the program to practice, maybe a tutorial or link to see how works ?
thank you

Thanks for the input! OptiX has a wide set of use cases from professional rendering to simulation. OptiX can even be used to simulate sound propagation, which can offer a much more immersive experience in VR.

The OptiX SDK comes with a number of samples to help you get started. They range from simple to advanced and should serve as a great place for you to start. Additionally, We have a number of talks at GTC this week that would be valuable to check out. “[A31547]: RTX Ray Tracing 101: Learn How to Build Ray-tracing Applications” would be the best one to watch for beginners.

1 Like

First a quick question: Is it possible that the OptiX users guide hasn’t been updated? The examples in there still mention the limit of 8 payload values.

A thing I find surprising about the OptiX 7.4 SDK is how all ray tracing calls (no matter how many payloads) are now mapped to a single PTX intrinsic _optix_trace_typed_32, whereas previous versions used finer-grained wrappers like optix_trace_1 … _optix_trace_8. This makes the wrappers in include/internal/optix_7_device_impl.h super-unwieldy (both in terms of C++ code, and the PTX code that is generated). Is this a good idea? Even the most basic one reads:

static __forceinline__ __device__ void optixTrace( OptixTraversableHandle handle,
                                                   float3                 rayOrigin,
                                                   float3                 rayDirection,
                                                   float                  tmin,
                                                   float                  tmax,
                                                   float                  rayTime,
                                                   OptixVisibilityMask    visibilityMask,
                                                   unsigned int           rayFlags,
                                                   unsigned int           SBToffset,
                                                   unsigned int           SBTstride,
                                                   unsigned int           missSBTIndex )
{
    float        ox = rayOrigin.x, oy = rayOrigin.y, oz = rayOrigin.z;
    float        dx = rayDirection.x, dy = rayDirection.y, dz = rayDirection.z;
    unsigned int p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21,
        p22, p23, p24, p25, p26, p27, p28, p29, p30, p31;
    asm volatile(
        "call"
        "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%"
        "29,%30,%31),"
        "_optix_trace_typed_32,"
        "(%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52,%53,%54,%55,%56,%57,%58,%"
        "59,%60,%61,%62,%63,%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80);"
        : "=r"( p0 ), "=r"( p1 ), "=r"( p2 ), "=r"( p3 ), "=r"( p4 ), "=r"( p5 ), "=r"( p6 ), "=r"( p7 ), "=r"( p8 ),
          "=r"( p9 ), "=r"( p10 ), "=r"( p11 ), "=r"( p12 ), "=r"( p13 ), "=r"( p14 ), "=r"( p15 ), "=r"( p16 ),
          "=r"( p17 ), "=r"( p18 ), "=r"( p19 ), "=r"( p20 ), "=r"( p21 ), "=r"( p22 ), "=r"( p23 ), "=r"( p24 ),
          "=r"( p25 ), "=r"( p26 ), "=r"( p27 ), "=r"( p28 ), "=r"( p29 ), "=r"( p30 ), "=r"( p31 )
        : "r"( 0 ), "l"( handle ), "f"( ox ), "f"( oy ), "f"( oz ), "f"( dx ), "f"( dy ), "f"( dz ), "f"( tmin ),
          "f"( tmax ), "f"( rayTime ), "r"( visibilityMask ), "r"( rayFlags ), "r"( SBToffset ), "r"( SBTstride ),
          "r"( missSBTIndex ), "r"( 0 ), "r"( p0 ), "r"( p1 ), "r"( p2 ), "r"( p3 ), "r"( p4 ), "r"( p5 ), "r"( p6 ),
          "r"( p7 ), "r"( p8 ), "r"( p9 ), "r"( p10 ), "r"( p11 ), "r"( p12 ), "r"( p13 ), "r"( p14 ), "r"( p15 ),
          "r"( p16 ), "r"( p17 ), "r"( p18 ), "r"( p19 ), "r"( p20 ), "r"( p21 ), "r"( p22 ), "r"( p23 ), "r"( p24 ),
          "r"( p25 ), "r"( p26 ), "r"( p27 ), "r"( p28 ), "r"( p29 ), "r"( p30 ), "r"( p31 )
        : );
    (void)p0, (void)p1, (void)p2, (void)p3, (void)p4, (void)p5, (void)p6, (void)p7, (void)p8, (void)p9, (void)p10, (void)p11,
        (void)p12, (void)p13, (void)p14, (void)p15, (void)p16, (void)p17, (void)p18, (void)p19, (void)p20, (void)p21,
        (void)p22, (void)p23, (void)p24, (void)p25, (void)p26, (void)p27, (void)p28, (void)p29, (void)p30, (void)p31;
}

My group builds JIT compilers generating OptiX code that often contain many ray tracing calls – it seems scary to generate that many unused/temporary variables when doing a few of these in a kernel.

1 Like

can offer a much more immersive experience in VR.

Hi, @akanell ! OptiX is a pure raytracing engine, however not a renderer. Developers can use it for implementation. My main concern is real-time rendering in VR, and unfortunately a sample code/ tutorial is missing for that.

Hi @wenzel.jakob, what are the benefits of using JIT compiler over NVCC compiler? Is it faster than nvcc?

@_Bi2022. Speed of compilation is not the motivating factor. JIT compilation is useful in applications where you don’t even know what code to execute until at runtime, for example when the user is writing it in a different language like Python, or when there are program transformations like differentiation that change the code at runtime. See this paper for details: RGL | Dr.Jit: A Just-In-Time Compiler for Differentiable Rendering

1 Like