Hello,
We are experiencing persistent CUDA out of memory errors on a GKE node equipped with an NVIDIA T4 GPU, configured with time-sharing. This issue was first raised with Google Cloud Support, and they have suggested we consult the NVIDIA forums as it may be a driver-level problem.
The core issue is that we receive OOM errors when creating CUDA contexts, even though our monitoring shows that only about 30% of the GPU’s VRAM is in use.
Environment
-
Cloud Platform: Google Kubernetes Engine (GKE)
-
GPU: 1x NVIDIA Tesla T4 (16 GB VRAM)
-
GPU Sharing Strategy: GKE Time-Sharing (with
max-shared-clients-per-gpu=48) -
NVIDIA Driver Version:
570.133.20(also confirmed the same issue on535.247.01) -
Node OS Image: Container-Optimized OS
cos-117-18613-263-14
Problem Description
Our service provides screen recording for video conferencing applications like Google Meet. We run multiple recording sessions in parallel on a single T4 GPU using GKE’s time-sharing feature.
The out of memory error consistently appears when we scale our application to around 15 Pods on a single node. Each pod runs a screen recording process using the gpu-screen-recorder tool, which leverages NVENC.
The Contradiction: OOM Errors vs. Monitoring Data
While the application and kernel logs report an Out of Memory condition, all our monitoring tools indicate ample available resources.
-
Google Cloud Monitoring: Reports GPU Memory Usage at only ~30% when the errors occur.
-
DCGM Exporter: Metrics for SM utilization and memory usage also show low consumption, with plenty of capacity available.
This suggests the OOM error is not related to a lack of VRAM capacity itself, but perhaps another resource limit.
Error Logs
We have captured the following error messages:
-
Kernel Log (
dmesgon the GKE node):[ 361.456869] NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY] (0x00000051) returned from _memdescAllocInternal(pMemDesc) @ mem_desc.c:1353 -
Application Error Log (from within the container):
[screen recorder] gsr error: gsr_cuda_load failed: unable to create CUDA context, error: out of memory (result: 2) gsr error: gsr_get_supported_video_codecs_nvenc: failed to load cuda Error: failed to query for supported video codecsThe key failure is the inability to create a new CUDA context.
Our Questions
-
Why would the NVIDIA driver report an
Out of Memoryerror when VRAM usage is only at 30%? -
Is there a hard limit on the number of concurrent CUDA contexts that can be created in a time-sharing configuration, which is separate from the total available VRAM?
-
Could this be a potential bug in the driver’s memory management or resource scheduling when used with GKE’s time-sharing scheduler?
Any insights or suggestions would be greatly appreciated. Thank you.