Modulus release_22.09 - helmholz example fails with RuntimeError: CUDA out of memory on GeForce GTX 1650

maric · November 4, 2022, 8:42am

Hi,

On this card

lspci | grep NVIDIA
21:00.0 VGA compatible controller: NVIDIA Corporation TU117 [GeForce GTX 1650] (rev a1)
21:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)

with this SMI

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3207 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 3383 G /usr/bin/gnome-shell 2MiB |
| 0 N/A N/A 5130 C python 57MiB |
±----------------------------------------------------------------------------+

I get this error, when failing to allocate 20 Megabytes - so I don’t think it is related to the reatively weak GPU

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 119, in fallback_cuda_fuser
            def backward(grad_output):
                input_sigmoid = torch.sigmoid(self)
                return grad_output * (input_sigmoid * (1 + self * (1 - input_sigmoid)))
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            return result, backward
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.82 GiB total capacity; 897.88 MiB already allocated; 20.25 MiB free; 3.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It reads as if TorchScript (part of PyTorch) doesn’t have access to the memory reserved by PyTorch?

ngeneva · November 4, 2022, 3:32pm

Hi @maric

The error tells the whole store:
3.82 GiB total capacity; 897.88 MiB already allocated; 20.25 MiB free; 3.06 GiB reserved in total by PyTorch

PyTorch will allocate its own chunk of VRAM for running, and what’s left (MiB free) is the max additional memory you can additionally allocate during run time at this point in the script. So you’re running out of GPU memory, which for a card with only 4Gb of ram thats expected.

Be mindful that we develop on NVIDIA V100s which have at least 16Gb of memory. This is stated in our userguide. That doesn’t mean all problems need 16Gb (quite the opposite) but this means for you’re hardware you will likely need to adjust.

The immediate options you have are:

Lower the batch size of all constraints
Lower the size of the neural network (for example reduce number of neurons in each layer)

maric · November 7, 2022, 9:46am

Thanks @ngeneva , all clear, I have misread the error message, I thought 3.06 GiB reserved in total by PyTorch was what Modulus has available in total so an the requested 20.00 MiB belong to that memory, while 20.25 MiB free (almost nothing) is what is left for other processes.

Topic		Replies	Views
OutOfMemoryError CUDA Programming and Performance cuda , pytorch	1	1065	March 13, 2024
GPU Cuda out of memory error CUDA Programming and Performance gpu , gpu-computing	2	1573	July 7, 2023
Limit tortoise-tts to less than 2GB memory? CUDA Programming and Performance	11	707	August 3, 2024
CUDA out of memory CUDA Programming and Performance cuda , deep-learning	1	1127	July 8, 2021
torch.OutOfMemoryError: CUDA out of memory when training model Linux pytorch , ai-training , training , natural-language-processing-nlp , ai-model-training	0	946	January 6, 2025
Black screen after installing CUDA, UBUNTU 20.04 General Topics & Other SDKs	0	502	June 17, 2021
CUDA out of memory Frameworks (archived) pytorch	1	994	April 1, 2020
Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0 Jetson Nano tensorflow , tf-trt , gpu	1	7604	January 27, 2021
CUDA out of memory??? CUDA Programming and Performance	0	599	May 16, 2019
cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	0	1797	April 18, 2019

Modulus release_22.09 - helmholz example fails with RuntimeError: CUDA out of memory on GeForce GTX 1650

Related topics