Unable to Run Evo2-40B on a Single H100 GPU Due to 'CUDA out of memory' Error

Hello folks,
I’ve been trying to run Evo2-40B NIM on a virtual machine (VM) equipped with a single H100 GPU, but I’m encountering a VRAM-related issue. I deployed this NIM using the Docker method.

Error message:

{"error":"CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 631.06 MiB is free. Process 100 has 78.56 GiB memory in use. Of the allocated memory 69.28 GiB is allocated by PyTorch, and 8.69 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variable"}

What is the solution to running Evo2-40B on a single H100 GPU?

Thanks!

@KindnessCUDA

Evo2-40B NIM requires the following resources, as mentioned in the documentation:

GPU GPU Memory (GB) Precision # of GPUs Disk Space (GB) CPU RAM
H100 80 Mixed 2 100 128
H200 144 Mixed 1 100 128

The GPU Memory and Disk Space values are in GB; Disk Space is for both the container and the model.

1 Like