[Solved]Can you run multiple instances of an application using Optix on a single GPU ?

I developed an application using optix (optix + cuda, not optix-prime).

When I am launching it, I always get about the same results (+/- 1e-8). The reason is because I perform a lot of additions that goes beyond the floating precision and the order it is happening may vary. Anyway, when I am launching several instances of my application in parallel on the GPU (I am doing that because the tool is part of a chain that is also using other tools that are running on the CPU in parallel), I have very different result for my application, but the other result is also quite constant. This behavior is quite random with one more instance, but under heavy load, at least 3 instances, it always happens. I first suspected a synchronization issue and I went into adding some extra cudaThreadSynchronize() to be sure but it did not solve the issue.

So my guess is that you cannot run multiple instances of one application using optix at the time on a single GPU. But is it correct ? The alternative is that something is obviously wrong with my code.

Note that I am using as block size: blockSize = deviceProp.maxThreadsPerBlock

tested with:



Graphics: Card: NVIDIA GK107 [GeForce GTX 650]
Display Server: X.Org 1.18.4 drivers: nvidia (unloaded: fbdev,vesa,nouveau)
Resolution: 1680x1050@59.95hz, 1680x1050@59.95hz
GLX Renderer: GeForce GTX 650/PCIe/SSE2 GLX Version: 4.6.0 NVIDIA 396.26

In principle I would expect multiple instances of OptiX to run in parallel on the same board. They will just not scale much because they compete for unique resources.

Have you debugged the OptiX code (exception program with exceptions enabled)?

What’s the VRAM configuration of that board?
What’s the workload size of your OptiX task? Does it fit for multiple processes at the same time?

Does that also reproduce when not using multi-monitor to free up some VRAM?

I would recommend to update to OptiX 5.1.0 and see if the problem persists.
CUDA 9.2 is not officially support with OptiX 5 versions. I’ve been using CUDA 9.0 for the most time, so trying that would be another option.
Or try newer display drivers with newer CUDA drivers.

Other than that, there is little to investigate with the given information. There is always the potential that something is not working correctly either in the drivers, OptiX, or your code, like scribbling over some memory area, a flipped bit, or flaky power supply etc. could all be responsible.

Noted. It makes sense. I use the approach I described because CPU is the limiting factor and I cannot batch all the queries on the GPU at the end.

No, I did not. I have not looked at that feature but I will if I find more issues.

I don’t know how to get that information but here are some:

lspci -vnn | grep VGA -A 12
165:03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK107 [GeForce GTX 650] [10de:0fc6] (rev a1) (prog-if 00 [VGA controller])
166-	Subsystem: Gigabyte Technology Co., Ltd GK107 [GeForce GTX 650] [1458:3555]
167-	Flags: bus master, fast devsel, latency 0, IRQ 29
168-	Memory at de000000 (32-bit, non-prefetchable) 
169-	Memory at c0000000 (64-bit, prefetchable) 
170-	Memory at d0000000 (64-bit, prefetchable) 
171-	I/O ports at dc80 
172-	[virtual] Expansion ROM at 000c0000 [disabled] 
173-	Capabilities: <access denied>
174-	Kernel driver in use: nvidia
175-	Kernel modules: nvidiafb, nouveau, nvidia_396, nvidia_396_drm


I am not sure I understand the question. All the processes fit in the VRAM.

I have not tried but the memory load stays just below 50% monitoring with nvidia-smi when using lot of instances.

I installed NVIDIA-OptiX-SDK-5.1.0-linux64 + cuda9.0. I am running some tests, I will post back.


I had the same issue on a K5000 but with the same OPTIX + CUDA setup, so it is software related.

Thank you for your help !

This solved the issue. Thank you !