Best Practices for Running Multiple Pixel Streaming Instances (Unreal/Unity) on a Single GPU – Real Partitioning with Vulkan/DirectX?

Hello everyone,

I’ve spent quite some time researching this topic but would really appreciate some insights from the experts here.

Use Case:
We have an Unreal Engine project that uses Pixel Streaming to deliver real-time rendered content to end users via the browser. Each user requires their own instance of the app. We’ve already containerized the project (Docker) and GPU passthrough works perfectly.

Challenge:
We want to dynamically scale up and down the number of running instances depending on user demand. Ideally, we’d like to schedule/manage these instances with Kubernetes, using the Nvidia Container Toolkit. The goal is to run multiple instances of the app on a single GPU.

What I’ve found so far:

  • MIG (Multi-Instance GPU):
    Doesn’t support Vulkan or DirectX APIs, so not suitable for real-time rendering with Unreal/Unity.
  • Nvidia vGPU:
    Requires real virtualization (full VMs), which feels like overkill for our container-based setup and adds significant complexity.
  • Docker GPU time-slicing:
    Doesn’t seem performant enough for real-time rendering workloads.

What I’m looking for:

  • “Real” GPU partitioning (ideally hardware-based, but container-friendly) that works with Vulkan/DirectX.
  • Preferably also with NVENC support for encoding (for pixel streaming, although I saw only 2 streams can be encoded at the same time, maybe this can be worked around with software encoding or a pixel streaming server)
  • Ideally manageable via Kubernetes/Nvidia Container Toolkit.

The Big Question:
What do you recommend for running multiple concurrent, isolated instances of an Unreal or Unity (remote rendering) app, on a single GPU, where each instance needs full API support (Vulkan/DirectX) and ideally access to NVENC?
Are there solutions, best practices, or workarounds that you’d suggest? Or is this just not feasible with current Nvidia hardware/software?
Which Nvidia Hardware is recommended? I think the best fit would be a L40 or L40s?

Any experiences or links to relevant resources would be greatly appreciated!

Thanks in advance for your help!