Cuda Error with VSS + CV pipeline on 4x L40s

Please provide the following information when creating a topic:

  • Hardware Platform (GPU model and numbers)

4x L40S GPUs

  • CPU Specs

2x Intel(R) Xeon(R) Gold 6448Y

  • System Memory

500 GB

  • Ubuntu Version

22.04.5 LTS

  • Kubernetes Version

MicroK8s v1.32.3 revision 8148

  • NVIDIA GPU Driver Version (valid for GPU only)

565.57.01

  • Nvidia GPU Operator Version

24.6.2

  • Issue Type( questions, new requirements, bugs)

Cuda error, please see the logs attached below
error_memory_custom_cv.txt (11.3 KB)

  • How to reproduce the issue ? (This is for bugs. Including the command line used and other details for reproducing)

I have attached the overrides_cv.yaml file used to recreate the setup.
overrides_cv.txt (4.8 KB)
When we run summarization on 20 sec video with 5sec chucking, it run fine.
But if we use 1 min video with any chucking it errors.(see the error_memory_custom_cv.txt)

  • Requirement details (This is for new requirement. Including the logs for the pods, the description for the pods)

Please look into the issue.

Could you first try to disable the CV and try again?

Hi yuweiw,

Everything works fine with CV disabled.
Everything works fine with CV enabled but small video 20 sec video

The Out of memory error occurs with CV enabled and 1 min video

We require the CV pipeline feature, as it is giving better result for our usecase.

The CV pipeline requires more models and GPU resources. Could you try to use 8xL40s or 4xH100(80G)?

Hi yuweiw,

We can’t increase the compute we are limited to 4x L40s.

Can you take a look at the overrides_cv.yaml file and suggest any optimisation, which can help us run the CV on 4x L40s.

You can try the ways below. However, there is no guarantee that the deployment will be successful.

  - name: NIM_LOW_MEMORY_MODE
    value: "1"
  - name: NIM_RELAX_MEM_CONSTRAINTS
    value: "1"
  1. Run the nvidia-smi command to check the utilization of resources, then allocate the resources more reasonably

Even if it can be successfully deployed, the service will be extremely slow.

Hi yuweiw,

we reducing the NUM_CV_CHUNKS_PER_GPU to 1 from 2, this helped us run larger videos without getting cuda out of memory error.

vss:
  applicationSpecs:
    vss-deployment:
      containers:
        vss:
          env:
          - name: NUM_CV_CHUNKS_PER_GPU
            value: "1"
1 Like