How does optix code compilation work?

huyleq1989 · June 23, 2022, 6:12pm

Hi,

I’m trying to build an application that includes optix and other cuda libraries like cufft and cublas so I need to understand how the compilation works.

My impression is that optix-program code has to be compiled into PTX code and further compiled at run time. Other non-optix cpp and cu codes can be compiled normally into cubin codes.

What I don’t get is how the final in-time compilation and loading of PTX code work. Do I compile and link non-optix codes into an executable, let it know where the optix PTX code is, run it, and it will load and further compile the PTX code?

Thanks.

Huy.

dhart · June 23, 2022, 8:43pm

Hi Huy,

Do I compile and link non-optix codes into an executable, let it know where the optix PTX code is, run it, and it will load and further compile the PTX code?

That’s exactly right. You compile your OptiX programs into PTX (or OptiX-IR), typically using nvcc at build time, and then at run time your program will call optixModuleCreateFromPTX() or optixModuleCreateFromPTXWithTasks() one or more times which will compile your intermediate code into binary form. You end up with one or more OptiX modules and link them together with optixPipelineCreate().

This entire process is specific to OptiX and doesn’t really overlap with other CUDA programs or libraries. The build & link process for anything else non-OptiX will be the same as it was before, there’s essentially no overlap as far as the build is concerned. If you can keep your CUDA and OptiX code cleanly separated, then you shouldn’t really have anything tricky to deal with. If you want to share any headers and/or code with both sides OptiX and CUDA, then you can end up with some amount of work to keep both compilers happy. It would be recommended and easiest, if you can, to try to keep the OptiX and CUDA code separated and not try to share symbols. Note I’m only talking about mixing the source code together, this has no bearing on whether you can mix OptiX and CUDA kernels at run time, nor whether you can pass buffer pointers back and forth, those things will work just fine in any case.

–
David.

huyleq1989 · June 23, 2022, 10:18pm

I’m building the SDK examples and when CUDA_NVRTC_ENABLED is ON in CMakeLists.txt, I don’t see any ptx file created. Only when it’s OFF, I see them in build/lib/ptx. Why is that? Where are the ptx code when it’s ON?

dhart · June 23, 2022, 10:29pm

With nvrtc, your source code is compiled into PTX on the fly “Just In Time” - nvrtc replaces nvcc as the C++ → PTX compiler in your build. The PTX in the case of JIT compiling is located only in memory, and you pass it directly to optixModuleCreateFromPTX() as a string without having to read or write a PTX file.

You can study the run-time build workflow if you look at the source in sutil.cpp.

The reason to use nvrtc is when you want to be able to change your OptiX shaders and re-run your application without having to run any build process. It’s usually just a minor convenience, and would typically be used for internal/private projects. The reason that most people use nvcc for professional applications is to avoid having to ship their OptiX shader source code.

The rest of the compilation process after you build the PTX is the same in either case.

–
David.

huyleq1989 · June 23, 2022, 10:51pm

thank you for the response. i’ll try and probably have more questions later.

droettger · June 24, 2022, 6:04am

Please read this related thread and follow the links in there for more information on NVCC and NVRTC compilation for OptiX programs:
https://forums.developer.nvidia.com/t/nvrtc-missing-stdint-h/146318

The reason that most people use nvcc for professional applications is to avoid having to ship their OptiX shader source code.

When using NVRTC, you would also need the OptiX headers and the CUDA headers, which means both SDKs would need to be installed on the target machine.
NVRTC can only generate PTX device code. Last time I checked it was about three times faster than NVCC because it doesn’t write any files.
I’ve used it in the past to generate high-level CUDA source code for materials at runtime which then got translated to PTX for OptiX on the fly.

huyleq1989 · June 24, 2022, 4:55pm

To make sure that I got it correctly, NVRTC is not a program that can be called from terminal like nvcc (since I don’t see it in /usr/local/cuda/bin where nvcc is), but a library to be linked right?

So when compiling an optiX device code, I still invoke nvcc but link it with -lnvrtc?

droettger · June 24, 2022, 6:07pm

NVRTC is not a program that can be called from terminal like nvcc but a library to be linked right?

Yes, NVRTC is a library with a few entry points with which you compile CUDA device source code to PTX device source code inside your application at runtime.

(I don’t know how some of the CUDA toolkit files are named under Liunx, so the following is how it looks under Windows:)

On the host side you need to include the nvrtc.h header (inside your CUDA/<version>/include folder) and link against the CUDA export library named nvrtc.lib (inside the CUDA/<version>/x64/lib folder to be able to compile your application doing nvrtc API calls.

What calls are needed can be found inside the OptiX SDK example framework when searching for nvrtc.

That export library only contains the interface of the dynamic link libraries which implement the actual NVRTC compiler and a precompiled standard library it needs.
These are located inside the CUDA/<version>/bin folder and are named with an nvrtc-prefix and the CUDA version, e.g. for Windows CUDA 10.1 they are named nvrtc64_101_0.dll and nvrtc-builtins64_101.dll.
These need to be redistributed along with the application.

As explained in the links I posted above, all headers which you’d need to compile the CUDA code would also be required on the target machine (and since license terms forbid shipping these with your application, the end user would need to install CUDA and OptiX SDKs on his/her own.)

Since NVRTC can only compile device code, care needs to be taken to never include any host compiler includes inadvertently (also described inside the linked threads), because you cannot expect a target system to have any compiler installed, at least under Windows.

So when compiling an optiX device code, I still invoke nvcc but link it with -lnvrtc?

Not sure I understand the question. If all your CUDA code is translated to PTX with NVRTC you wouldn’t need NVCC and vice versa. You can also compile all CUDA device code which never changes with NVCC during build time of your project and only translate dynamically generated CUDA sources with NVRTC to PTX at runtime.

If you do not have any need to generate CUDA device code at runtime, there is also no need to use NVRTC at all.
You should simply build everything with NVCC and ship the translated PTX code with your application.

That is what most applications do and what all OptiX SDK examples do when you disable NVRTC inside the CMake settings.

huyleq1989 · June 24, 2022, 6:58pm

The -lnvrtc is a cuda flag that I think needed to include when compiling the host code (I’m running on linux).

Isn’t -lnvrtc just the telling the host compiler (gnucc, not nvcc) to import the nvrtc export library to be able to resolve the nvrtc API entry points?
Again you wouldn’t need to link against the NVRTC export library at all when not calling any nvrtc entry points inside your application’s host code.

huyleq1989 · June 29, 2022, 11:02pm

Ok I think I got OptiX to compile and work with a Cuda application. My question now is can the ray parameter struct, Params, in the SDK, be templated? If so, how would that change the compilation process?

droettger · June 30, 2022, 7:10am

You want the structure which is used as launch parameter block in constant CUDA memory to be a template?
Why? Based on what template arguments?
How many different launch parameter structures would you need and how would that change the device programs accessing that data?

Usually you would implement a specific launch parameter structure which exactly matches to how the device programs inside one or multiple pipelines are coded. That is something you hardcode once and never touch again.

Or it’s not actually necessary to have different structs.
For example, if you need some pointers inside the types inside the launch parameter structure to point to memory of different formats, you can define the pointer as CUdeviceptr and reinterpret that to the desired type dynamically. (Mind that CUDA requires specific byte alignments for different vector types. You’ll get misaligned access errors when not adhering to the proper alignment.)
Example code here where I switch between float4 and half4 buffers with a compile time switch, but that could also be handled with a runtime parameter:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_denoiser/shaders/system_parameter.h#L43
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_denoiser/shaders/raygeneration.cu#L233

It’s not that you need to have different launch parameter structs per pipeline if they access different fields inside the struct either. Each could read only what it needs from a bigger structure.

Note that the constant CUDA memory the launch parameters reside in is limited to 64 kB. Means whatever big data you need to access globally, only store a pointer to that inside the launch parameters.
Make the launch parameter structure as small as possible. Place fields in them according to their alignment requirements (in decreasing alignment) to prevent the compiler from adding unnecessary padding.

Other than that, it’s a struct defined in a header and used inside C++ code. I don’t see why you shouldn’t be able to implement that as a template but I have never attempted this because that’s usually unnecessary.

dhart · June 30, 2022, 3:57pm

Maybe you’re interested in launch parameter specialization? This is different than templating, but it allows you to conditionally compile some of your launch parameters as if they are constant values, thus allowing the compiler to elide them, and speed up your device code.

–
David.

huyleq1989 · June 30, 2022, 5:16pm

ok i think i got the idea.

speaking of types, can optix do double precision, for acceleration structs, and launch parameters, like ray origins, directions et al…? i only see float

also, i see the launch parameter is defined as constant Params params in device code, while it’s d_params and copied from cpu by regular cudaMemcpy. beside the different names, i thought constant memory is copied by cudaMemcpyToSymbol. is something else happing under the hood?

droettger · July 1, 2022, 7:19am

speaking of types, can optix do double precision, for acceleration structs, and launch parameters, like ray origins, directions et al…? i only see float

No. None of the available ray tracing APIs (OptiX, DXR, Vulkan Raytracing) support double precision data in acceleration structures, the ray definition, the transforms, or any other built-in functionality. That is all 32-bit floating point precision.

What you put into the launch parameters or any other developer defined data structure or how you implement your device code is completely your choice. Means it’s possible to use doubles in your OptiX device code but unless there are quantifiable precision requirements it’s definitely not recommended to do so simply for performance reasons.

Note that double precision performance on standard desktop and mobile GPUs is dramatically slower than floating point performance except for some compute-only products which in turn have no hardware RT cores. (Your V100 is one of them.)
You can query the single to double precision performance ratio via the CUDA runtime API cudaGetDeviceProperties() or CUDA driver API cuDeviceGetAttribute() calls.

The forum has a search feature in the top right which can be limited to sub-forums when starting the search, for example, on the OptiX forum view. Please have a look into these previous discussions about that topic which explain some options. Look out for comments on watertight intersections in the results as well.
https://forums.developer.nvidia.com/search?q=double%20precision%20%23visualization%3Aoptix

also, i see the launch parameter is defined as constant Params params in device code, while it’s d_params and copied from cpu by regular cudaMemcpy. beside the different names, i thought constant memory is copied by cudaMemcpyToSymbol. is something else happing under the hood?

Yes, OptiX handles that for you. That’s why you need to provide the launch parameter struct name to the OptixPipelineCompileOptions pipelineLaunchParamsVariableName.
https://raytracing-docs.nvidia.com/optix7/api/struct_optix_pipeline_compile_options.html#a716d5238c52743e20dce1e92575c6802

huyleq1989 · July 1, 2022, 10:53pm

the reason im asking for double precision is because i only want to do 2d graphics, polygonal objects, but from what i understand, in optix, the way to represent polygons are piecewise linear curves, with thickness.

so naturally, i set everything up on the xy plane, e.g. the acceleration structures, ray origins and directions et al, all have zeros for z coordinates, and with very small thickness (2d polygons’ edges have zero thickness). but i notice that if i set the thickness smaller than 3e-4, ray-object intersection/hit is wrong, as if the thickness is too small that some rays miss the objects that they should see.

droettger · July 4, 2022, 7:55am

The curve primitives in OptiX are not 2D. They are round 3D shapes, like cylinders or the volume built by sweeping a sphere with varying radius along a 3D curve. Their main use case is the implementation of hair strands in 3D renderers.

A lot of care has been taken to make the curve intersection algorithms as precise as possible, but depending on your scene and camera setup there could of course be precision issues from the finite floating point representation.
But ray tracing is not working like rasterization where line primitives will affect whole pixels depending on specific rasterization rules (like diamond exit) to either set or not set a pixel on the screen.

Note that curve primitives can be a lot thinner than one pixel depending on the camera setup. Means if the sampling of the fragments making up one pixel on the screen is not dense enough, you will simply not be able to hit the curve with all rays because the curve can fall between the discrete fragment sample points. That is unrelated to floating point vs. double precision. (Nyquist theorem comes to mind.)
That is why curves are usually rendered by partitioning each pixel into very many fragments which each define a primary ray to accumulate the hit and miss results from geometric primitives accurately. The OptiX SDK curve examples show that.

i only want to do 2d graphics, polygonal objects,
i set everything up on the xy plane, e.g. the acceleration structures, ray origins and directions et al, all have zeros for z coordinates

Are you saying you want to shoot rays in the same plane as the geometry?
Otherwise the z-component of neither the ray origin nor the ray direction should be zero if you plan to project the polygon outlines onto some camera plane.

What exactly do you want to implement?

The following assumes this is about 2D graphics in the usual sense.
Why do you think that would require ray tracing?

Even if you would do that with ray tracing, you wouldn’t need curve primitives for that but could also define your 2D polygonal objects by tessellating them into flat triangles instead which would be even more efficient with RTX boards do to the watertight triangle intersection hardware.
Still, the sampling of very thin triangles would run into the exact same camera sampling issues.

I think a rasterizer could handle that a lot faster and would also allow overlapping geometry by render order without the need to handle depth separation (painter algorithm). Precision could be increased by using multisampling.

Maybe have a look at the NVIDIA Path Rendering SDK instead which uses a dedicated OpenGL extension GL_NV_path_rendering to implement hardware accelerated resolution independent vector graphics.
https://developer.nvidia.com/nv-path-rendering
https://developer.nvidia.com/gpu-accelerated-path-rendering

huyleq1989 · July 6, 2022, 5:18pm

i’m not rendering any image on screen. i want to compute polygon visibility, as in this link

I have a bunch of polygons and locations of observers, all in 2D, and would like to know the shortest distances from the observers to the polygons along 360 degree directions. Since they are all in 2D so I set the z-coordinates of the polygons, of the rays’ origins and directions to all be zeros.

dhart · July 6, 2022, 8:08pm

Hi @huyleq1989,

This makes sense. You can avoid the precision issues by “extruding” your 2d world into a “thick” 2d scene. Leave your ray origin and direction Z values at 0, but generate geometry that goes from Z=-0.5 to Z=+0.5 (for example). Then you don’t need precision in order to capture any thin primitives edge-on. It doesn’t really matter how much you extrude, and you don’t need to have the curves and triangles match. As long as everything is thicker than 3e-4, you’ll have an easy time. Curves will necessarily need to be as thick as your curve radius. Place the curve control points at Z=0, and then the curves will intersect the Z=0 plane at their thickest, so tracing rays should do exactly what you want. Triangles you have discretion about how far to extrude, and this means tessellating the edges of your 2d primitives as 3d walls that extend only in +/- Z, rather than tessellating a flat primitive that is in the Z=0 plane.

Does that make sense? Set the Z coordinates of your ray origin & direction to zero. Do not set the Z coordinates of your polygons to zero. Do the opposite and use only vertical quads make out of triangles.

–
David.

huyleq1989 · July 6, 2022, 8:47pm

I’m a bit confused. You said “Place the curve control points at Z=0” then “Do not set the Z coordinates of your polygons to zero”. But I represent my polygons as piecewise linear (order 1) curves. Are you saying I should not use that representation?

dhart · July 6, 2022, 9:31pm

Oh, sorry, I should have been more clear. I was assuming that you might mix triangles and curves. You can use linear curves, as long as you’re okay with how they behave and they’re thick enough. OptiX linear curves have rounded end-caps, so if you wanted sharp right-angle corners or sharp square ends for some of your geometry, it might be better to use triangles.

I was suggesting to avoid using flat polygons like this:

Instead if you want sharp corners, you could extrude the polygon edges like so:

If you use only curves, then it might look like this, and you will not have sharp outer (convex) corners (the inner concave corners are still sharp):

If you are already doing the last one (only curves in your OptiX scene) and having precision issues, then maybe you could talk about or show what the scene geometry looks like?

–
David.

Topic		Replies	Views
optixTriangle: how to shoot rays to specific set of co-ordinates? OptiX	10	424	June 20, 2024
Comparing Optix performance to CUDA OptiX	20	6067	June 14, 2022
OptiX Time for Launch OptiX	9	1327	June 14, 2022
OptiX, OptiX Prime, Compatibility with CPU and RTX OptiX	23	6239	June 14, 2022
How CUDA works within Optix OptiX cuda	15	155	March 10, 2025
Query on Optix Baking / Texture / Data storage (Basic query!!) OptiX	14	3219	June 14, 2022
optiXTutorial 11 - remove (free)GLUT OptiX	37	4642	June 14, 2022
"illegal memory access" when trying to set up multiple cameras OptiX	7	195	July 13, 2024
Simple PTX shader - OptiX 7 OptiX	27	4171	October 12, 2021
VTK-Optix triangle mesh write/read operations OptiX	5	494	June 13, 2024

How does optix code compilation work?

Related topics