So I have been working on a Bidirectional Path Tracer with optix and having finally completed it, was moving on to analysis on the kernels. With OptiX, the only way I found to actually do timing was the following:
clock_t start_time = clock();
// some small amount of code
clock_t stop_time = clock();
int time = (int)(stop_time - start_time);
rtPrintf("time in func %fms\n", time / clockRate);
where clockRate is the shader clock frequency that I found out.
My question is, is there any other “standard” way to do this in other kernel functions, like cudaEvent? (cudaEvents do not work in a RT_PROGRAM since cudaEvent is a host function)
Also, profiling with NSight gives information only about the top level kernel function, am I missing something or is there another way to profile with OptiX?
For low level perf counting this is currently the best option. NSight can also be useful for higher level statistics (eg, total local memory accesses, etc). A variation on your above code is to use your ‘time’ variable to render a heat-map like rendering where costly pixels are white and low cost is close to black.
We are hoping to improve support for NSight such that source level profiling can be performed on optix codes, but this is not ready yet.
We are also working on some improved diagnostic reporting tools for an upcoming release. This will include reporting of time spent and invocation counts for each of the programs in your kernel as well as other helpful statistics.
Thank you for your response. Looking forward to upcoming updates with NSight and OptiX. The heat-map was a really effective analysis and thank you for the idea.