Native Time-Slicing vs vGPU latency due to context switching

johnmark1 · May 14, 2026, 1:54pm

How close in perfromance is the native timeslicing using NVIDIA GPU Operator compared to using vGPU (vWS) for both graphics and compute workloads? Secondly which has better (lower) latency of the two during context switches for graphics?
What makes me ask this question is, does the absense of a hypervisor reduce overhead when someone uses time-slicing in bare metal K8s with GPU Operator, therefore lower latency or does vGPU (via vGPU Manager) more efficiently handle the context switching compared to native time-slicing offered by GPU Operator?
I’ll appreciate your perspective/help. Thanks.

Topic		Replies	Views
K8s gpu-operator time-slicing under the hood? cuDNN kubernetes	0	219	July 9, 2024
Improving GPU Utilization in Kubernetes Technical Blog	11	2591	September 25, 2024
GPU Time Slicing Not Working Properly on NVIDIA A5000 with Kubernetes + GPU Operator CUDA Setup and Installation	0	103	August 6, 2025
The configuration of GPU Time-Slice on Kubernetes CUDA Programming and Performance gpu , kubernetes	4	954	May 16, 2024
The detailed configuration of GPU Time-Slice on Kubernetes General Discussion	3	428	April 9, 2024
Unusual slow NVIDIA Vulkan API on Linux, why? Vulkan	0	1124	November 18, 2018
VM Performance CUDA Programming and Performance	0	478	December 10, 2016
Performance degradation while CUDA activity from independent host processes CUDA Programming and Performance	2	551	March 3, 2023
Questions for CUDA Time-slicing in kubernetes Docker and NVIDIA Docker kubernetes	0	999	September 22, 2022
About GPU Scheduling with Timeslice Jetson AGX Orin gpu	7	402	February 19, 2025

Native Time-Slicing vs vGPU latency due to context switching

Related topics