We (http://www.lavision.de) using muliple GPUs in our software. Each GPU is used by one thread (openmp). When we have different GPUs we get an error “cudaErrorMemoryAllocation”. With two or four same GPUs, it is OK.
Specs:
Win7 / Win10 (both x64)
Driver: 398.82 → OK with same GPUs, NOK with different GPUs
Driver: 391.35 → OK with same and different GPUs
GPUs: GTX 680 + GTX 1080 Ti → NOK
GTX 980 Ti + GTX 1080 → NOK
2 x GTX Titan → OK
The same error occours in the “cudaOpenMP” sample.
I have the same issue, resulting in my path tracer using only single GPU with the newest drivers installed.
(GPU-s in my system are GTX 1080 ti and GTX 980; with driver 391.35 both work just fine, with the newer ones one of the cards fails (which one, depends on the exact driver and seems somewhat arbitrary))
Hi TheDonsky,
Can you be more specific with your issue? I tried to reproduce in lab but no luck. "with the newer ones one of the cards fails " can you share the version of newer ones? And you are using CUDA9.2.148 toolkit, right?
I am using CUDA 9.0 and testing on Windows 7 machine (CUDA 9.2 needs newer driver to even run); the code either fails on allocation, or the kernel launch (is an issue from a while ago and I can’t recall the exact error due to the fact, that the application now has a GUI and is not logging anything on console). Also, the code uses some templated function pointers on top of everything and I thought it might be an issue with mismatched addresses, but I have nothing to prove it…
The memory management system in my project is somewhat too convoluted to be worth a direct investigation, so I would not exactly recommend researching the code; I’ll just try to reproduce the issue on a much smaller scale in a single file or something like that and post it here. The only problem is that this, being my personal side project, is not what I can get involved in right now and it may have to wait for a couple of days.
(Note: on my windows 10 machine everything seems to be working properly, as if it were the older drivers, but it’s a single GPU system and it’s kind of irrelevent for this case)
I too confirm, that the bug is gone after updating to 399.07; and the performance is a little better as well.
So, I don’t see an obvious reason for attempting to replicate the issue and will likely ignore my promise, unless someone thinks, we might still need to work on the case.