GPU Cuda out of memory error

I am getting the following error message in the vscode app, when I run a jupyter notebook, using python 3.11.3 as a jupyter kernel:

‘“name”: “OutOfMemoryError”,
“message”: “CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 8.00 GiB total capacity; 7.00 GiB already allocated; 0 bytes free; 7.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF”’

I ran the following iterative tests, disabling each GPU one by one:

Test 1.
NVIDIA GeForceRTX 2080 1 disabled, 2, 3 and 4 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run successful, no errors.

Test 2.
NVIDIA GeForceRTX 2080 2 disabled, 1, 3 and 4 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run successful, no errors.

Test 3.
NVIDIA GeForceRTX 2080 3 disabled, 1, 2 and 4 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run unsuccessful, CUDA Out of memory error (‘Test 3 error’ attached).

Test 4.

NVIDIA GeForceRTX 2080 4 disabled, 1, 2 and 3 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run unsuccessful, Unknown error ‘Test 4, 5 and 6 error’ attached.

Test 5.

NVIDIA GeForceRTX 2080 1 disabled, 2, 3 and 4 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run unsuccessful, Unknown error ‘Test 4, 5 and 6 error’ attached.

Test 6.

NVIDIA GeForceRTX 2080 2 disabled, 1, 3 and 4 enabled. Run the segment/mask step in the ‘Segment Anything Model’ jupyter notebook in VSCODE. Run unsuccessful, Unknown error ‘Test 4, 5 and 6 error’ attached.

To me these results don’t point to an issue with a specific GPU, but instead to a problem that is exacerbated as the number of runs increases. Perhaps a memory cache issue… although I have integrated a line to clear the cache, and the error persists. Has anyone come across this before?

Thanks,

Karl

I don’t think you’ll find many pytorch experts here. You may get better results on a forum for pytorch like this one.

Thanks for your response, I have posted in the forum you mentioned.

Karl