when running my application and looking at the nvidia-settings panel it shows that my 3 K6000 only have a load of around 25% and my single Quadro 5000 that drives the projection is at a similar load level. PCIe load is shown around 30%. I did a quick test with nvprof, but it seems of limited use with OptiX since all time (>95%) is shown to be spent in __globfunc__Z7trace_0v, which I assume is the kernel assembled from my PTX programs. Is there a better way to find out where my application is spending most of its time?
System: Optix 3.8.0, Cuda 7.0 on Fedora 21 x86_64. 3x K6000, 1x Quadro 5000 (not used in OptiX context), driver 346.87.