Modulus release_22.09 - helmholz example fails with RuntimeError: CUDA out of memory on GeForce GTX 1650

Hi,

On this card

lspci | grep NVIDIA
21:00.0 VGA compatible controller: NVIDIA Corporation TU117 [GeForce GTX 1650] (rev a1)
21:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)

with this SMI

nvidia-smi
Fri Nov 4 09:36:22 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:21:00.0 Off | N/A |
| 30% 32C P8 4W / 75W | 73MiB / 3911MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3207 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 3383 G /usr/bin/gnome-shell 2MiB |
| 0 N/A N/A 5130 C python 57MiB |
±----------------------------------------------------------------------------+

I get this error, when failing to allocate 20 Megabytes - so I don’t think it is related to the reatively weak GPU

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 119, in fallback_cuda_fuser
            def backward(grad_output):
                input_sigmoid = torch.sigmoid(self)
                return grad_output * (input_sigmoid * (1 + self * (1 - input_sigmoid)))
                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            return result, backward
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 3.82 GiB total capacity; 897.88 MiB already allocated; 20.25 MiB free; 3.06 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It reads as if TorchScript (part of PyTorch) doesn’t have access to the memory reserved by PyTorch?

Hi @maric

The error tells the whole store:
3.82 GiB total capacity; 897.88 MiB already allocated; 20.25 MiB free; 3.06 GiB reserved in total by PyTorch

PyTorch will allocate its own chunk of VRAM for running, and what’s left (MiB free) is the max additional memory you can additionally allocate during run time at this point in the script. So you’re running out of GPU memory, which for a card with only 4Gb of ram thats expected.

Be mindful that we develop on NVIDIA V100s which have at least 16Gb of memory. This is stated in our userguide. That doesn’t mean all problems need 16Gb (quite the opposite) but this means for you’re hardware you will likely need to adjust.

The immediate options you have are:

  1. Lower the batch size of all constraints
  2. Lower the size of the neural network (for example reduce number of neurons in each layer)
1 Like

Thanks @ngeneva , all clear, I have misread the error message, I thought 3.06 GiB reserved in total by PyTorch was what Modulus has available in total so an the requested 20.00 MiB belong to that memory, while 20.25 MiB free (almost nothing) is what is left for other processes.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.