Hardware Platform: 8x NVIDIA L40S GPUs (EC2 g6e.48xlarge instance)
System Memory: 1536 GB RAM (EC2 g6e.48xlarge specification)
Ubuntu Version: Ubuntu 24.04 (Deep Learning Base OSS NVIDIA Driver GPU AMI)
NVIDIA GPU Driver Version: Driver Version: 570.172.08; CUDA Version: 12.8
Issue Type: Bugs
How to reproduce the issue: I’m encountering out-of-memory errors when attempting to deploy the VILA 1.5 40B VLM model using Docker Compose of VSS on the recommended hardware configuration. The deployment fails during model loading/initialization.
Command used:
docker-compose up
Docker Compose File:
compose.yaml_nvidiaForums.txt (3.3 KB)
Error Logs:
VSS_40bModel_ErrorLogs.txt (81.8 KB)
Additional Context:
Following the official deployment documentation for VSS - Deploy Using Docker Compose — Video Search and Summarization Agent
Using the exact recommended EC2 instance type (g6e.48xlarge) with 8x L40S GPUs
AMI: Deep Learning Base OSS NVIDIA Driver GPU AMI (Ubuntu 24.04)
The error occurs consistently during model deployment
Questions:
Is there a known workaround for this out-of-memory issue on the recommended hardware?
Are there specific Docker Compose configuration parameters that need adjustment for the 40B model?
Should we consider different memory optimization strategies or model sharding configurations?
I have detailed logs and system configuration documentation available. Please let me know if you need any additional information to help diagnose and resolve this deployment issue.
Files Information Consists:
Complete docker-compose.yml file
Full error logs and system configuration document
Output of nvidia-smi and system resource information