I’m writing a Path Tracer using OptiX 8. This is my first OptiX application and I (wrongly) started by porting my CPU Path Tracer after reading the Programming Guide and looking at a few examples.
The design I’ve decided to give to it is the following: the Scene hierarchy is loaded on host, and transferred to GPU. The Scene’s device pointer becomes the launchParams of the OptiX program which in turn is able to access Scene elements (e.g. the Integrator (I support many of them), Sampler, Mesh(es) etc.). I only have one module binding to all entrypoints (rayGen, hitCH, miss), which is compiled to .optixir. This module is responsible to forward the event to the Integrator instance, that can be get from launchParams.
In my application code, I have some functions using OptiX Device API, that are supposed to be called only by the OptiX program. For example:
The OptiX program compilation to .optixir succeeds, but the compilation of the App fails:
ptxas /tmp/tmpxft_00016fed_00000000-7_Sampler.ptx, line 1295; error : Call to '_optix_get_launch_index_x' requires call prototype
ptxas /tmp/tmpxft_00016fed_00000000-7_Sampler.ptx, line 1300; error : Call to '_optix_get_launch_index_y' requires call prototype
ptxas /tmp/tmpxft_00016fed_00000000-7_Sampler.ptx, line 1305; error : Call to '_optix_get_launch_index_z' requires call prototype
I lack an understanding of how and if the .optixir module can communicate back with the application code. I know that I can share utilities forcing functions to inline, or with more effort using callable programs, but I’d need a suggestion before proceeding.
What do you think about the design I’ve given to my program? Is it compatible with OptiX?
EDIT: how can I inspect the content of a optixir and of a ptx file?
How exactly are you compiling the .optixir file, and how are you loading it? Are you passing the -optix-ir flag (or ‘–ptx’ when using ptx) to nvcc?
When you build the OptiX SDK samples, check out the “VERBOSE” cmake variables in the “advanced” variables section. These enable print of full command line for each build step, which might help you see & debug the build process.
Here are a couple of old threads on the SDK build process that might help you (one of these has the same error message you posted):
I don’t yet understand your design ideas enough to comment on them. In particular I would assume the integrators and samplers are mainly code(?), I’m not sure I understand what it means to access those from the scene, unless you’re referring to things like material parameters.(?) The optixPathTracer sample is quite small, and a good one to play with, modify, and understand completely before going very deep into an OptiX based renderer design. From there, I’d recommend also studying the design of OptiX Advanced Samples.
OptiX programs typically communicate with the host CPU application code via memory copies, and/or any of the various shared memory types (unified memory, pinned & mapped or unmapped memory, etc.) This is the same way that any CUDA application communicates with the host; think of an OptiX application as just CUDA with some raytracing. Launch Params you already know about, that’s a way to send a small amount of global data to your OptiX module. To share other kinds of data, the most common setup is to allocate memory using cudaMalloc(), copy using cudaMemcpy(), and read/write from the appropriate copy of the buffer on either side. The easiest thing to study, perhaps, is the result image buffer written in any OptiX SDK sample, study how those are allocated, written by the device, copied, and read by the host. I hope I interpreted your question correctly and didn’t just say a bunch of stuff you already know.
The module’s compilation command (nvcc -optixir my_module.cu -o my_module.optixir) run successfully, so I thought this design was feasible. However I really think it would have given me issues when later calling (optixCreateModule), due to the missing implementation of integrate().
In particular I would assume the integrators and samplers are mainly code
Yes, they are code (no data) that have to be called from the OptiX IR module.
OptiX programs typically communicate with the host CPU application code via memory copies
Regarding the “communication” from OptiX module to the Application, I was referring to being able to call device functions, compiled in the Application, from the OptiX module. Just like this example (*). So it’s not about data, but about running code. But I’m afraid there’s no “magic” permitting it…
About the error I posted, I think it’s because I was using OptiX functions (e.g. optixGetPayload_0()) in the Application (not in the OptiX module).
After encountering this issue and confront my code with others and examples, I found out I had to change the project structure. I refactored it, but I’m encountering a new issue (unrelated to the previous one, however I’m keeping the same thread):
Now the code I’m trying to compile into an OptiX module, is a set of many translation units (.cu files) and header files. By using either -ptx (resp. -optix-ir), I’m able to compile these .cu files into .ptx files.
From OptiX 8 API, optixModuleCreate takes as an input a single ptx file.
So my question is: is there a way I can combine multiple ptx files into a single ptx? I compare it to combining multiple object files into a single static library.
Discussing with other people, we came up with a possible solution. I can use:
# Compile OptiX module files to PTX
nvcc -ptx file1.cu -o file1.ptx
nvcc -ptx file2.cu -o file2.ptx
nvcc -ptx file3.cu -o file3.ptx
nvcc -dlink -cubin file1.ptx file2.ptx file3.ptx -o all.cubin
nvcc -dlink -fatbin file1.ptx file2.ptx file3.ptx -o all.fatbin
# Decompile to a single .ptx file using cuobjdump? Yet not sure how...
Is there a simple way to do it?
The alternative would be to refactor the project to have only one translation unit (e.g. by making all the .cu files headers and making the main .cu file inherit all of them). This is pretty much what I’ve seen examples do.
About the error I posted, I think it’s because I was using OptiX functions (e.g. optixGetPayload_0() ) in the Application (not in the OptiX module).
Aha, that could do it. I’ll recommend keeping the host code and device code entirely distinct in separate files, as much as possible. It’s not a hard rule, just a suggestion; shared host/device utility functions are common, but try not to mix code intended to be only host with code intended to be only device in the same files.
is there a way I can combine multiple ptx files into a single ptx?
With OptiX, you can link multiple modules into a pipeline. There shouldn’t be any need to merge ptx or optixir files since you can use multiple modules in a pipeline, since you can have multiple program entry points in a single .cu & .ptx/.optixir file, and since for each program group you can specify a different entry point. Think of optixModuleCreate() sort of like compiling an object file and optixPipelineCreate() as linking multiple object files.
You can also have multiple pipelines in an OptiX-based application, but pipelines don’t communicate directly, so for the purposes of code that communicates with or calls other code, you just need everything in the same pipeline, but not necessarily in the same module. The caveat with this to be aware of is that optix API device functions usually need to be inlined and/or marked as “optix-enabled”. See the “non-inlined functions” section of the “Callables” chapter in the OptiX Programming Guide for more info about optix-enabled functions. [1] You can also use Direct Callables or Continuation Callables for more explicit non-inlined function calls. Functions that call optixTrace specifically should plan to use Continuation Callables. [2]
With your own code that doesn’t call any OptiX API functions, you can use regular non-inlined functions and link those function calls across different modules.
since you can link multiple modules into a pipeline. There shouldn’t be any need to merge ptx or optixir files since you can use multiple modules in a pipeline
My issue is that my OptiX Module is a made of multiple cu files, where each cu file is associated to a class. For example:
Out of the nvcc compilation (e.g. to ptx) I’m getting Scene.ptx, Emitter.prx, Integrator.ptx, …
So I could programmatically load all of these as OptiX modules and link them within the pipeline. But I don’t see this design flexible: if I add a new class I have to programmatically load another module.
What I was searching for is to group these modules into one ptx file that I can load as a single module… is there something like that? Maybe my OptiX
Module code structure is wrong, and I should go for a single translation unit?
P.S.: I’m talking about PTX but the same goes for optixir
You can use nvcc to compile multiple .cu files at once using the relocatable device code flag, if you want it to not inline your classes and instead link them. (With the already-mentioned caveat that OptiX API functions must be inlined and/or optix-enabled.) But doing this by default for all your code will likely backfire. There are multiple problems with approaching GPU code that way, so I would definitely recommend studying existing solutions more and seeing how people achieve code sharing and code organization in CUDA without using the common CPU-based approach of producing one object file per class.
The most common stance, or maybe “default assumption” for GPU code is that everything will be inlined into your entry point function. In order to organize your code to make this possible, you could mainly build header-only classes, that’s one common approach. Then you don’t need multiple .cu files, you can instead have one .cu file per entry point or program group, and the .cu file can #include all the helper classes it needs. Yes, this can result in lots of duplicated compiled binary instructions, that is essentially what heavy inlining can lead to.
You do have control over preventing inlining as needed, as I mentioned earlier, using __noinline__, __optix_enabled__, Direct Callables, or Continuation Callables. But if you care about performance, you will probably want to control the use of linking individual classes & functions over inlining them very carefully. Linked function calls on the GPU are much more expensive than on the CPU and they prevent optimization across the function call boundary. For this reason, it’s a good idea to assume that you might not want to organize GPU code the same way that CPU code is organized.
We do not recommend building one OptiX module for each class you have and attempting to link the code in the pipeline. We do recommend organizing your code around the entry points and/or program groups that you will need.