Limit tortoise-tts to less than 2GB memory?

revolt3d · July 23, 2024, 3:13am

I might be posting this in the wrong forum. I have cuda and pytorch setup to run tortoise-tts, but my GPU is old and only has 2GB of memory, which isn’t enough.

Is there any way to limit the amount of memory cuda can use?

Or is this something I need to look at solving at the pytorch or tortoise-tts?

Or is this something that can only be solved by buying better hardware?

I’ve messed with some tuning stuff, but I have no idea what I’m doing.

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256,garbage_collection_threshold:0.8

Whatever I do, it runs out of memory with an error.

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 1.95 GiB total capacity; 1.52 GiB already allocated; 7.75 MiB free; 1.56 GiB allowed; 1.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I’m running cuda 11.4.

njuffa · July 23, 2024, 4:05am

How would that help? To the contrary, you would want CUDA to use as much of the memory provided by the hardware as possible. The good news: It already does that by default. A quick Google search brings up complaints from people running tortoise-tts on GPUs with 4 GB and 6 GB about out-of-memory errors, leading me to believe that 2 GB is just too small. Someone suggested setting autoregressive_batch_size to 1. No idea where you would set this, search through the available configuration settings.

For resource issues it is a best practice to look through configuration settings starting at the top of a software stack, so YES.

Possibly, even likely, in light of people with 4B and 6 GB GPUs running out of memory. You would want to exhaust looking through configuration settings in the software stack first before taking that step, because even if you opt for a second-hand GPU it is unlikely to come for free.

For reference, a GPU with 2 GB provides just enough memory to run the GUI and common applications of an average Windows system. How do I know this? I foolishly thought I could configure a Windows system with a powerful GPU for compute and an old GPU with 1 GB for GUI needs, only to experience out-of-memory conditions from running GUI, some browser windows, and a PDF reader.

revolt3d · July 23, 2024, 12:15pm

Thank you. I don’t run anything else on the GPU. I’m running Linux with no windows manager running. Even when I did have it running x-windows it was like 70mb.

I’ll try more tuning to see if I can get it there, but my goal is to let use all of the memory - that’s fine. I just don’t want to ask for more if all the memory is used up.

revolt3d · July 24, 2024, 3:07am

No luck tuning this to work. I’m on a low budget, but this is for sale on ebay, it gets me 16gb of memory, and I think it’s cuda compatible.

Yah or nay?

njuffa · July 24, 2024, 6:46am

According to the TechPowerUp database, this is an 11 year old part with compute capability 3.0. No longer supported by current and recent versions of CUDA. Avoid.

The first thing you need to figure out is what is the oldest GPU architecture supported by each component of your software stack. CUDA 12.x supports GPUs with compute capability 5.0 (Maxwell architecture) and higher.

Curefab · July 24, 2024, 9:32am

If you can spare a bit over 150 USD, try to get a RTX 3060 with 12 GB or a RTX 2060 with 12 GB (the 2060 often have only 6 GB).
This should give you more memory and a generation supporting Tensor Core computations, which is helpful for good pytorch performance.

revolt3d · July 25, 2024, 8:03pm

I bought one of these: MSI RTX 2060 VENTUS OC Specs | TechPowerUp GPU Database

It says it has a cuda rating of 7.5.

To use the old hardware, I held Ubuntu back to an older version so that I could get cuda 11.4 installed. But now that I have a more modern card, can I run the latest versions of cuda?

njuffa · July 25, 2024, 8:27pm

Yes. CUDA 12.x supports GPUs with compute capability (CC) 5.0 or higher, which includes CC 7.5.

Curefab · July 25, 2024, 8:32pm

Hi revolt3d,
congratulations to your new card.
See e.g. here for compatibility: CUDA - Wikipedia

BTW The Grid K1 was a compute/capability=architecture 3.0 Kepler device with 4 separate GK107 chips (NVIDIA GRID K1 Specs | TechPowerUp GPU Database), so each GPU could have only accessed 4 GB of the 16 GB memory each. You would have had problems with larger PyTorch models regardless of the architecture version.

The newest CUDA SDK (12.5) supports devices from 5.0 upwards. So you should be able to run the newest version for the next few years, probably at least including a future 14.X. Turing is also the first consumer GPU with the general (super-)architecture introduced with Volta 7.0 and kept ever since. So it is most likely to be supported for a long time by frameworks and SDKs as it is less effort compared to older generations like Kepler, Maxwell or Pascal. And it has Tensor Cores, although no support for sparse matrices, which are easily simulated with dense matrices and no (very new) 4, 6 or 8-bit floating-point types; the support for BF16 and TF32 also is very very unofficial with Turing (either experimental or a Nvidia business decision then for differentiating the generations) and typically not provided by the SDK and the frameworks, when selecting 7.5, only by the hardware; so you have to be careful about the used data types of neural networks to choose optimally accelerated ones. FP16 is well supported.

revolt3d · August 3, 2024, 12:27am

I just wanted to close the loop on this. My new GPU is working great. I’m turning text into speech. My advice to anyone doing this, spend the money to buy a good GPU. I got this 2060 with 12GB of memory for like $175 on ebay.

njuffa · August 3, 2024, 12:42am

Excellent. Now test it with these two sentences:

“How to recognize speech”
“How to wreck a nice beach”

Can you hear the difference?

revolt3d · August 3, 2024, 2:27am

I think it does it pretty good. That’s my voice.

External Media

system · August 17, 2024, 2:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cudaMalloc3DArray out of memory can not allocate the available amount of memory CUDA Programming and Performance	3	1811	January 31, 2011
cuDevicePrimaryCtxRetain returns CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	3	491	June 8, 2023
"torch.version.cuda" remains the same after manually upgrading CUDA to 11.8 Jetson Orin NX cuda , pytorch	7	1102	August 3, 2023
What is the Compute Capability of a GeForce GT 710 CUDA Programming and Performance	9	27433	October 12, 2021
Multi-GPU performance incredibly slow CUDA Programming and Performance	7	3040	January 2, 2020
My GPU Became Slower... after 1 month of not testing cuda CUDA Programming and Performance	18	12162	August 23, 2010
cuda memory usage in debug(with GDB),debug(without GDB) and release differ, extra 2GB usage in relea CUDA Programming and Performance	11	4207	February 9, 2016
GTX TITAN X and other >4GB VRAM cards CUDA Programming and Performance	10	2574	June 16, 2015
CUDA hardware & software CUDA Programming and Performance	9	2665	November 13, 2010
GTX295 Specefications & CUDA CUDA Programming and Performance	5	12285	October 7, 2010

Limit tortoise-tts to less than 2GB memory?

Related topics