Multi GPU support for 50xx series

Greetings,

We’re encountering significant issues getting multi-GPU configurations working properly on the RTX 50xx series.
The Microsoft heterogeneous multiadapter sample fails during the CreateHeap() call when using the following heap flags: D3D12_HEAP_FLAG_SHARED | D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER

In addition, all legacy applications that previously worked flawlessly on RTX 20xx, 30xx, and 40xx series GPUs now report a driver internal error on the RTX 50xx series:
D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.).

System configuration:
2x NVIDIA RTX 5070, 580.88 driver, Windows 10.

We’ve also confirmed that GravityMark fails in a similar way on a system with two RTX 5090 GPUs.

Thank you

Hello,

this issue is reproducible on our end as well.

Apperent only on 5.xxx series GPUs from nvidia.

Issue does NOT manifest on 4.xxx, 3.xxx and 2.xxx series cards.

5..xxx does work in mGPU DX12 on VK

AMD no issues in DX12 or VK in mGPU

this not a reference to SLI or Crossfire, this is strictly DX12 mGPU

D3D12_HEAP_FLAG_SHARED | D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER

D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR

whenever a selection such as mGPU DX12 is made from a menu item like this:

this error occurs on execution:

D3D12_HEAP_FLAG_SHARED | D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER

D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR

Summary:

Issue is localized to 5.xxx series nVidia GPUs on DX12 mGPU (non SLI)

This is 100% reproducible on 5.xxxx GPUs and mGPU/DX12

Additionally, the issue is independent an non-relevant to any BIOS setting or motherboard

tested on Asus WS PRO, Asus Sage Gigabyte MZ72 series and MZ73 series.

OS was fully patched Windows 11, and Windows Data Center 2022 and 2025

All GPUs tested were the Founders Edition, except for the AMD GPUs of course

This applies to all drivers so far from 572.16 to 581.15

Hi there @frustum and @jayventuri, nice to see you on the NVIDIA developer forums.

Usually we point users with consumer side issues to the GeForce Forums.

In this case I was able to find this issue already in our internal bug tracker.

The good news is that there is a fix to it. But it is not quite certain which release driver will contain the fix.

I am currently trying to find that out.

Thanks!

Thank you Markus, If you need to beta test / get feedback on the fix/driver please let me know. Glad to help

Jay

Markus, thank you for the update. It would be great to have a driver to test.

I might have been too rushed in my response, sadly. The fix had been deployed already, but it seems it addressed an Omniverse specific use-case. But it has the same signature so engineering at least has a good start what to look for.

I opened a new internal bug and will let you know if I need some more input.

One thing that might help would be a kernel (memory) dump if any of you can provide one after the issue occurs.

Thanks!

Thank you, I’ll see what we can do for a kernel memory dump

1 Like

Sir

solved the DX12 and multiple 5.xxx series GPUs errors.

example. DX12 and dual 5090s is SOLVED

this also solved it for:

Gravitymark in DX12 mGPU AFR
Ashes of the Singularity DX12 mGPU
Strange Brigade in DX12 mGPU
And several UE5 games mGPU

UTILIZATION in GravityMark as example:
GPU 1. - 96-100%
GPU 2. - 83-94%

I still have some refinements to do, but I’m most of the way there

lastly, overall performance increased for some single GPU benchmarks/games/apps

Details to follow

J

mGPU for 5.xxx series GPUs is now working

tested on:

dual 5090 FE

dual 5080 FE

Details to follow

1 Like

That is great news!

Which driver version is that now @jayventuri ?

The fix is not specific to any driver. I guess I fixed it for all drivers from 72.x to 81x for al 5.xxx series GPUs

I tested the fix on 72.x drivers, 73.x, 76.x and 81.x

I’ll post details of the fix shortly

Since the Omniverse Kit (thank you for letting me know about the prior issue) is now working fine with the dual 5.xxx series cards in DX12 applications

So I went on a deep dive of the profile parameters in profile inspector for the Omniverse Kit.

There are several unique modifications compared to other profiles and so I tested each to eliminating the key that makes or breaks Dual 5.xxx series cards from working in mGPU mode in DX12.

The only differences, but effective are:

That was all it took.

REGARDLESS if those feature flags are used or not, they have to be disabled (both) in the profile of the app/game for DX12, or mGPU will not work

I expected something mor elaborate as I was diligently eliminating Omniverse specific keys, but alas, turns out only these two variables need to be set.

What makes NO SENSE is a side effect of some benchmarks having higher scores in Single GPU from those variables being set.

Obviously there is a significant issue in the nvidia drivers from 72.xx to 81.xx and this is just a work around, hopefully nvidia will fix it soon …