I developed an application using optix (optix + cuda, not optix-prime).
When I am launching it, I always get about the same results (+/- 1e-8). The reason is because I perform a lot of additions that goes beyond the floating precision and the order it is happening may vary. Anyway, when I am launching several instances of my application in parallel on the GPU (I am doing that because the tool is part of a chain that is also using other tools that are running on the CPU in parallel), I have very different result for my application, but the other result is also quite constant. This behavior is quite random with one more instance, but under heavy load, at least 3 instances, it always happens. I first suspected a synchronization issue and I went into adding some extra cudaThreadSynchronize() to be sure but it did not solve the issue.
So my guess is that you cannot run multiple instances of one application using optix at the time on a single GPU. But is it correct ? The alternative is that something is obviously wrong with my code.
Note that I am using as block size: blockSize = deviceProp.maxThreadsPerBlock
In principle I would expect multiple instances of OptiX to run in parallel on the same board. They will just not scale much because they compete for unique resources.
Have you debugged the OptiX code (exception program with exceptions enabled)?
What’s the VRAM configuration of that board?
What’s the workload size of your OptiX task? Does it fit for multiple processes at the same time?
Does that also reproduce when not using multi-monitor to free up some VRAM?
I would recommend to update to OptiX 5.1.0 and see if the problem persists.
CUDA 9.2 is not officially support with OptiX 5 versions. I’ve been using CUDA 9.0 for the most time, so trying that would be another option.
Or try newer display drivers with newer CUDA drivers.
Other than that, there is little to investigate with the given information. There is always the potential that something is not working correctly either in the drivers, OptiX, or your code, like scribbling over some memory area, a flipped bit, or flaky power supply etc. could all be responsible.