Docker Compose Deployment Issue with VSS providing VILA 1.5 40B VLM Model - Out of Memory Error on EC2 g6e.48xlarge

ravichandra.challa · August 13, 2025, 7:22am

Hardware Platform: 8x NVIDIA L40S GPUs (EC2 g6e.48xlarge instance)

System Memory: 1536 GB RAM (EC2 g6e.48xlarge specification)

Ubuntu Version: Ubuntu 24.04 (Deep Learning Base OSS NVIDIA Driver GPU AMI)

NVIDIA GPU Driver Version: Driver Version: 570.172.08; CUDA Version: 12.8

Issue Type: Bugs

How to reproduce the issue: I’m encountering out-of-memory errors when attempting to deploy the VILA 1.5 40B VLM model using Docker Compose of VSS on the recommended hardware configuration. The deployment fails during model loading/initialization.

Command used:

docker-compose up

Docker Compose File:

compose.yaml_nvidiaForums.txt (3.3 KB)

Error Logs:

VSS_40bModel_ErrorLogs.txt (81.8 KB)

Additional Context:

Following the official deployment documentation for VSS - Deploy Using Docker Compose — Video Search and Summarization Agent

Using the exact recommended EC2 instance type (g6e.48xlarge) with 8x L40S GPUs

AMI: Deep Learning Base OSS NVIDIA Driver GPU AMI (Ubuntu 24.04)

The error occurs consistently during model deployment

Questions:

Is there a known workaround for this out-of-memory issue on the recommended hardware?

Are there specific Docker Compose configuration parameters that need adjustment for the 40B model?

Should we consider different memory optimization strategies or model sharding configurations?

I have detailed logs and system configuration documentation available. Please let me know if you need any additional information to help diagnose and resolve this deployment issue.

Files Information Consists:

Complete docker-compose.yml file

Full error logs and system configuration document

Output of nvidia-smi and system resource information

yuweiw · August 14, 2025, 2:20am

Have you modified the “export NVIDIA_VISIBLE_DEVICES=0,1,2” in the .env file?

ravichandra.challa · August 18, 2025, 10:31am

Yes, i kept as 0 and also number of GPU’s like 0,1,2,3,4….. also as part of the testing

yuweiw · August 19, 2025, 1:41am

The meaning of export NVIDIA_VISIBLE_DEVICES=0,1,2 is that you are using GPU0, GPU1 and GPU2. If you are using L40S, you can try to set the parameters in the .env file like below.

#Set VLM to NVILA
export VLM_MODEL_TO_USE=nvila
export MODEL_PATH=ngc:nvidia/tao/nvila-highres:nvila-lite-15b-highres-lita

#Adjust misc configs if needed
export DISABLE_GUARDRAILS=false
export NVIDIA_VISIBLE_DEVICES=0,1,2 #For L40S Deployment

ravichandra.challa · August 19, 2025, 4:27am

.env_issue_topic.txt (1.2 KB)

We have used ngc key and different combinations also but unable to deploy the 40b model, But 15b model deployment we have done earlier itself. We have experienced the 35b nim container model also earlier and seen some performance gap in between vila api responses which is providing by the nvidia. So we wanted to try this 40b model which was providing as part of the VSS blueprint only.

Or is there any other way to access this 40b model or not ?

we’ve utilized 8L40S based instances also but seen OOM issues checked with both heml chart and docker compose deployment methods. I ‘ve shared those earlier itself which consists of error logs and system details too…

yuweiw · August 19, 2025, 6:30am

Could you attach your Deployment Topology? Theoretically, three L40s should be sufficient for the VLM. If you have depolyed the LLM on the 0,1,2,3 GPUs, you can set the NVIDIA_VISIBLE_DEVICES=4,5,6.

ravichandra.challa · August 19, 2025, 8:56am

Hi @yuweiw,

Thanks for the suggestion. I need to clarify our deployment approach and share additional details:

Our Current Setup:

We’re using a hybrid approach with docker compose method
LLM & Embedding: Using API calls (not local models) - so no GPU allocation needed for these
VLM only: Trying to deploy the 40B model locally using available GPUs
In our .env file, we only see one NVIDIA_VISIBLE_DEVICES parameter since we’re not running LLM locally

Current .env configuration:

Using the .env file I tagged earlier in the conversation
All GPUs should be available for VLM since LLM/embedding are via API calls

Persistent Issues:

Still encountering OOM errors even with 8x L40S setup
Tried both helm chart and docker compose deployment methods
15B model works fine, but 40B model consistently fails

Additional Files Sharing:
I’m attaching:

overrides.yaml file with our current configuratio and Error logs showing the OOM issues we’re experiencing when we tried with helm chart deployment.
config.yaml used in the approach of the docker compose method.

Could you please review these files and suggest if there are any memory optimization parameters or configuration modifications we can make to successfully deploy the 40B model?

Is there something specific in our hybrid approach that might be causing these memory issues?

Thanks for your continued support!

helm chart deployment process.docx (286.5 KB)

config.yaml_docker compose method.txt (3.1 KB)

yuweiw · August 20, 2025, 7:04am

You can try to set the VLM_BATCH_SIZE to 1 in your override file.
Additionally, I did not see any error messages in your helm chart file.

Also, could you just try to use the engine file directly. You can refer to our vss-configuration-vila-engine-ngc-resource to learn how to use that.

Topic		Replies	Views
VSS Custom deployment docker compose build failure Visual AI Agent	9	188	May 27, 2025
VSS 2.3.0 Docker remote_llm_deployment Failed to generate TRT-LLM engine Visual AI Agent nim , paligemma , kosmos-2 , llama	5	129	May 23, 2025
401 unauthorized access Visual AI Agent nim , llama-31-70b-instruct , llama	12	227	April 28, 2025
Resource exhausted: OOM when allocating tensor with shape[5,512,50,158] .../task:0/device:GPU:0 by allocator GPU_0_bfc Triton Inference Server (archived) tensorflow	0	834	January 4, 2021
L4T container with Docker Compose Jetson AGX Xavier cuda , docker	4	2830	October 18, 2021
Quick start jarvis_init cuda out of memory Riva riva	3	1675	March 31, 2021
Unable to start riva Riva	6	1729	March 12, 2022
Jarvis Support for GPU RTX 2060 Riva riva	4	1006	July 19, 2021
"./jarvis_start.sh" timed out Riva riva	20	2236	May 21, 2021
Error Code 2: OutOfMemory (no further information) Riva ubuntu , riva	9	1991	September 30, 2022

Docker Compose Deployment Issue with VSS providing VILA 1.5 40B VLM Model - Out of Memory Error on EC2 g6e.48xlarge

Related topics