Adding emulated GPU

Hello,

I’m developing a CUDA-based program that must support multiple NVIDIA GPUs working concurrently. This includes selecting devices, running kernels on them, and managing resources across devices.

However, my development machine only has one physical GPU, and I need a way to test the multi-GPU logic locally before deploying it to the client’s machine, which has several GPUs.

Is there a way to simulate or emulate a second GPU that can run actual CUDA kernels?
Ideally, something that makes CUDA see at least two devices (e.g., cudaGetDeviceCount() == 2), even if the second device internally maps to the same hardware or uses a fallback.

Requirements:

  • Both (real and emulated) devices must be able to run actual CUDA kernels.
  • My code uses ONNX Runtime with the CUDA Execution Provider.
  • I don’t need high performance from the emulated device — just the ability to test real kernel dispatch on multiple devices.
  • Not using Docker or VMs, and would prefer a solution that works directly on Windows 11.

Development Environment:

  • Windows 11
  • CUDA 11.8
  • Visual Studio 2022
  • 1 NVIDIA GPU

Any ideas, tools, or workarounds to help simulate multi-GPU behavior would be greatly appreciated.

Thanks!