Dear NVIDIA developer team,
This week, I have recently just updated my graphic cards from rtx2060 to rtx3060 because it has more VRAM, so that I could train deep learning experiments faster.
The problem is, now, I cannot even training with the new GPU due to constant OOM issue. I have tested that both Pytorch (1.7.1cu11.0, 1.8.0cu11.1) and Tensorflow-gpu (2.4.3 cu11.1) give the same OOM error.
But from my observation, the GPU usage rises with Tensorflow-gpu (although in the end it cries OOM) to 9.xx G from the available 12GB of VRAM. However, I didn’t observe any spike in the GPU memory usage when using Pytorch-gpu.
Hence, I am wondering, is this might be an issue in the cuda driver itself, which probably doesn’t support RTX3060 (yet, since it is <1 month old)?
Reproduce the issue
Pytorch
I have tried this and this, but without much help.
To test pytorch, here.
Tensorflow
To test tensorflow:
test_tf.py (2.5 KB)
Error snapshot:
1 Like
Hi @briliantnugraha,
thanks for raising this issue.
If I understand the use case correctly, you are seeing an OOM error on your 3060 using the PyTorch 1.8.0+CUDA11.1 binaries (pip wheels or conda binaries) by running the CIFAR10 script?
If so, could you run a quick test and try to allocate a single tensor on this device via:
import torch
x = torch.randn(1024**3, device='cuda')
print(x.shape)
and check, if this would also run OOM?
This would allocate 4GB on your device and should work fine.
Since you are seeing an OOM using the CIFAR10 example, I guess the OOM might be a red herring, as this example should not use the complete device memory.
It seems that you’ve already allocated data on this device before running the code.
Could you empty the device and run:
import torch
print(torch.cuda.memory_summary())
x = torch.randn(1024**3, device=‘cuda’)
print(torch.cuda.memory_summary())
1 Like
Hello, I have the same problem described, and I’ve tried with the test code propossed,…:
The environment:
I’m running the code on Ubuntu 21.04, under PyCharm pro.
My test code:
import torch
try:
print(torch.cuda.memory_summary())
x = torch.randn(1024**3, device='cuda')
print(torch.cuda.memory_summary())
except Exception as ex:
print(str(ex))
print(torch.cuda.memory_summary())
The output:
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Allocations | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|===========================================================================|
Traceback (most recent call last):
File "/home/jero/Proyectos/EmocionesBasicas/emociones/services/test_cuda.py", line 4, in <module>
x = torch.randn(1024**3, device='cuda')
RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 5.81 GiB total capacity; 0 bytes already allocated; 3.94 GiB free; 0 bytes reserved in total by PyTorch)
(env) jero@nassat:~/Proyectos/EmocionesBasicas/emociones/services$ python test_cuda.py
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Allocations | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|===========================================================================|
CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 5.81 GiB total capacity; 0 bytes already allocated; 3.94 GiB free; 0 bytes reserved in total by PyTorch)
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 1 | cudaMalloc retries: 1 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 0 B | 0 B | 0 B | 0 B |
| from large pool | 0 B | 0 B | 0 B | 0 B |
| from small pool | 0 B | 0 B | 0 B | 0 B |
|---------------------------------------------------------------------------|
| Allocations | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 0 | 0 | 0 | 0 |
| from large pool | 0 | 0 | 0 | 0 |
| from small pool | 0 | 0 | 0 | 0 |
|===========================================================================|
When I run my project … the GPU memory ussage is:
Do you know if can I move the last rwo processes out of the GPU? The last one is a web service that runs the models.
When I get the OOM, the memory usage is:
2021-08-26 08:37:48,875 log_emociones - ERROR - Detectada excepción al crear el clasficador de Emociones Básicas: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 5.81 GiB total capacity; 418.75 MiB already allocated; 12.69 MiB free; 472.00 MiB reserved in total by PyTorch)
2021-08-26 08:37:48,875 log_emociones - ERROR - Resumen de la memoria de NVIDIA:
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 1 | cudaMalloc retries: 1 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 428797 KB | 428797 KB | 428797 KB | 0 B |
| from large pool | 428288 KB | 428288 KB | 428288 KB | 0 B |
| from small pool | 509 KB | 509 KB | 509 KB | 0 B |
|---------------------------------------------------------------------------|
| Active memory | 428797 KB | 428797 KB | 428797 KB | 0 B |
| from large pool | 428288 KB | 428288 KB | 428288 KB | 0 B |
| from small pool | 509 KB | 509 KB | 509 KB | 0 B |
|---------------------------------------------------------------------------|
| GPU reserved memory | 483328 KB | 483328 KB | 483328 KB | 0 B |
| from large pool | 481280 KB | 481280 KB | 481280 KB | 0 B |
| from small pool | 2048 KB | 2048 KB | 2048 KB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 54530 KB | 54552 KB | 265212 KB | 210681 KB |
| from large pool | 52992 KB | 52992 KB | 263168 KB | 210176 KB |
| from small pool | 1538 KB | 2044 KB | 2044 KB | 505 KB |
|---------------------------------------------------------------------------|
| Allocations | 203 | 203 | 203 | 0 |
| from large pool | 75 | 75 | 75 | 0 |
| from small pool | 128 | 128 | 128 | 0 |
|---------------------------------------------------------------------------|
| Active allocs | 203 | 203 | 203 | 0 |
| from large pool | 75 | 75 | 75 | 0 |
| from small pool | 128 | 128 | 128 | 0 |
|---------------------------------------------------------------------------|
| GPU reserved segments | 21 | 21 | 21 | 0 |
| from large pool | 20 | 20 | 20 | 0 |
| from small pool | 1 | 1 | 1 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 19 | 19 | 20 | 1 |
| from large pool | 18 | 18 | 19 | 1 |
| from small pool | 1 | 1 | 1 | 0 |
|===========================================================================|