I developed an application using optix (optix + cuda, not optix-prime).
When I am launching it, I always get about the same results (+/- 1e-8). The reason is because I perform a lot of additions that goes beyond the floating precision and the order it is happening may vary. Anyway, when I am launching several instances of my application in parallel on the GPU (I am doing that because the tool is part of a chain that is also using other tools that are running on the CPU in parallel), I have very different result for my application, but the other result is also quite constant. This behavior is quite random with one more instance, but under heavy load, at least 3 instances, it always happens. I first suspected a synchronization issue and I went into adding some extra cudaThreadSynchronize() to be sure but it did not solve the issue.
So my guess is that you cannot run multiple instances of one application using optix at the time on a single GPU. But is it correct ? The alternative is that something is obviously wrong with my code.
Note that I am using as block size: blockSize = deviceProp.maxThreadsPerBlock
Graphics: Card: NVIDIA GK107 [GeForce GTX 650]
Display Server: X.Org 1.18.4 drivers: nvidia (unloaded: fbdev,vesa,nouveau)
Resolution: email@example.com, firstname.lastname@example.org
GLX Renderer: GeForce GTX 650/PCIe/SSE2 GLX Version: 4.6.0 NVIDIA 396.26