Hello folks,
I’ve been trying to run Evo2-40B NIM on a virtual machine (VM) equipped with a single H100 GPU, but I’m encountering a VRAM-related issue. I deployed this NIM using the Docker method.
Error message:
{"error":"CUDA out of memory. Tried to allocate 704.00 MiB. GPU 0 has a total capacity of 79.19 GiB of which 631.06 MiB is free. Process 100 has 78.56 GiB memory in use. Of the allocated memory 69.28 GiB is allocated by PyTorch, and 8.69 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variable"}
What is the solution to running Evo2-40B on a single H100 GPU?
Thanks!