hey, i found some strange behavior related with RT_CALLABLE_PROGRAM.
if i call markDirty() for an instance while using RT_CALLABLE_PROGRAM, the next launch call becomes drastically slower than not call markDirty().
i am using optix 3.0 SDK, and you can easily make this situation.
i use whirligig example source. i just added below code to pinhole_camera.cu.
I assume you used markDirty on an acceleration structure. In such cases, the next launch call would rebuild the acceleration structure first before the actual ray tracing begin, which results in the FPS dropping down. I don’t think this has anything to do with RT_CALLABLE_PROGRAM.
Look at Page 27 of the OptiX 3.0 Programming guide. The “refine” or “refit” property might be what you need if you don’t want a full rebuild of the acceleration structure.
I know what do you mean, so i called it as “strange behavior”. I also agree with you that rebuilding the acceleration structure has nothing to do with RT_CALLABLE_PROGRAM.
but it makes an great impact on FPS. the FPS of original whirligig on my test environment is about 50, but when i added just only the above code, the FPS is about 4~5 terribly.
and I already tested the above code with “refine” and “refit” property enabled, and other acceleration structure types. it improved just a little.
additionally, if i don’t markDirty() when i use RT_CALLABLE_PROGRAM, it works normally.
This may look like a bug, I’ve managed to replicate your problem. I believe this has something to do with the optimization routine within OptiX during compilation time. The callable function might have been removed when it is not used, leading to some kind of unknown complications. If you have called the
RT_CALLABLE_PROGRAM function somewhere within any RT_PROGRAM, the FPS will not be affected.
Dear,
I faced another problem related with RT_CALLABLE_PROGRAM.
I had tried to repeat the haruband’s experiment with callable programs using the Whitted sample.
I inserted below code to pinhole_camera.cu, just before the line “RT_PROGRAM void pinhole_camera()”:
Hi Victor, it’s likely that you’re compiling your code targeting sm_1.x devices.
In that case, and with the function you’re inserting, there’s a small caveat in optix_device.h:
/* This is used to declare programs that can be attached to variables and called from
* within other RT_PROGRAMS.
*
* There are some limitations with PTX that is targetted at sm_1x devices.
*
* 1. Functions declared with RT_CALLABLE_PROGRAM will not be emitted in the PTX unless
* another function calls it. This can be fixed by declaring a __global__ helper
* function that calls the desired function.
*
* RT_CALLABLE_PROGRAM
* float3 simple_shade(float multiplier, float3 input_color)
* {
* return multiplier * input_color;
* }
*
* #if __CUDA_ARCH__ < 200
* __global__ void stub() {
* (void) simple_shade( 0, make_float3(0,0,0) );
* }
* #endif
*
* 2. You can't pass pointers to functions or use integers for pointers. In the first
* case CUDA will force the inline of the proxy function removing the call altogether,
* and in the case of passing pointers as integers, CUDA will assume that any pointer
* that was cast from an integer will point to global memory and could cause errors
* when loading from that pointer. If you need to pass pointers, you should target
* sm_20.
*/
To summarize: your function is being elided from the generated ptx if you’re compiling with sm_1.x.
Add an intermediate target (e.g. -arch=compute_20) to your NVCC compilation command: look into the wherever_you_cmake’d_the_OptiX_SDK\whitted\CMakeFiles\whitted.dir\whitted_generated_pinhole_camera.cu.ptx.cmake program that calls NVCC at the line
set(CUDA_NVCC_FLAGS --use_fast_math;--compiler-options;/D_USE_MATH_DEFINES ;; ) # list
and add that option
set(CUDA_NVCC_FLAGS -arch=compute_20;--use_fast_math;--compiler-options;/D_USE_MATH_DEFINES ;; ) # list
Add a helper method as described above to deal with sm_1.x issues
Obviously, while reading the Programming guide I saw that capability 1.x should be processed in particular manner. I did not pay attention as I had another GPU.
O-o-o, after next update of a text I run a build of the whitted and VS2010 asked to reload projects of all SDK samples. I looked at the same place: there is the old version of the line. So, each reload means the return to the initial state. It means I need each time to examine
whitted\CMakeFiles\whitted.dir\whitted_generated_pinhole_camera.cu.ptx.cmake
I have found that the original File CMakeLists.txt (in SDK dir) sets only “set(flag “–use_fast_math”)”. It would be nice if CMakeLists.txt sets "-arch=compute_20;” also. In such a case I need to use CMake only once for each project.
Probably global NVCC setting exists: it is a solution for me also.
Could you help me?
Next. I would be happy if the next versions of OptiX setup would take into account higher capabilities in that or another way, e.g., several files CMakeLists.txt.
Nevertheless, I count that OptiX setup should not ignore higher capabilities. For example, advises in CMakeLists.txt or INSTALL-WIN.txt
Best wishes,
Victor
I agree with your concerns and, by the way, sm_10 devices are being deprecated by CUDA6 so hopefully we’ll be able to solve this gracefully in a few time.