We’ve recently upgraded our evaluation system from Fermi to Kepler based GPUs (3x Quadro 5000 -> 3x Quadro K6000), running under linux x86_64, OptiX 3.5.1, Cuda 5.5 with current drivers.
Unfortunately there has been little (if any) improvement in tracing performance and we are wondering if our application is doing something silly that slows things down? At its core we have a recursive ray tracer, mostly primary and shadow rays (~3 light sources) with some recursion for transparent or reflective surfaces. The main render buffer is an OpenGL buffer object (usage: GL_STREAM_DRAW) shared with OptiX and the flags are set to RT_BUFFER_OUTPUT and format RT_FORMAT_FLOAT4 (tone mapping is done by the OpenGL shader drawing this buffer to the screen). For an average frame around 50 variables are assigned (rtVariableSet*) including the ones that hold the output buffer.
I know this is a fairly vague description, but perhaps there is something jumping out as a bad idea for a multi-GPU setup? Suggestions on how to analyze this better are also very much appreciated. Thanks!