Limit tortoise-tts to less than 2GB memory?

I might be posting this in the wrong forum. I have cuda and pytorch setup to run tortoise-tts, but my GPU is old and only has 2GB of memory, which isn’t enough.

Is there any way to limit the amount of memory cuda can use?

Or is this something I need to look at solving at the pytorch or tortoise-tts?

Or is this something that can only be solved by buying better hardware?

I’ve messed with some tuning stuff, but I have no idea what I’m doing.

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256,garbage_collection_threshold:0.8

Whatever I do, it runs out of memory with an error.

RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 1.95 GiB total capacity; 1.52 GiB already allocated; 7.75 MiB free; 1.56 GiB allowed; 1.52 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I’m running cuda 11.4.

How would that help? To the contrary, you would want CUDA to use as much of the memory provided by the hardware as possible. The good news: It already does that by default. A quick Google search brings up complaints from people running tortoise-tts on GPUs with 4 GB and 6 GB about out-of-memory errors, leading me to believe that 2 GB is just too small. Someone suggested setting autoregressive_batch_size to 1. No idea where you would set this, search through the available configuration settings.

For resource issues it is a best practice to look through configuration settings starting at the top of a software stack, so YES.

Possibly, even likely, in light of people with 4B and 6 GB GPUs running out of memory. You would want to exhaust looking through configuration settings in the software stack first before taking that step, because even if you opt for a second-hand GPU it is unlikely to come for free.

For reference, a GPU with 2 GB provides just enough memory to run the GUI and common applications of an average Windows system. How do I know this? I foolishly thought I could configure a Windows system with a powerful GPU for compute and an old GPU with 1 GB for GUI needs, only to experience out-of-memory conditions from running GUI, some browser windows, and a PDF reader.

1 Like

Thank you. I don’t run anything else on the GPU. I’m running Linux with no windows manager running. Even when I did have it running x-windows it was like 70mb.

I’ll try more tuning to see if I can get it there, but my goal is to let use all of the memory - that’s fine. I just don’t want to ask for more if all the memory is used up.

No luck tuning this to work. I’m on a low budget, but this is for sale on ebay, it gets me 16gb of memory, and I think it’s cuda compatible.

Yah or nay?

According to the TechPowerUp database, this is an 11 year old part with compute capability 3.0. No longer supported by current and recent versions of CUDA. Avoid.

The first thing you need to figure out is what is the oldest GPU architecture supported by each component of your software stack. CUDA 12.x supports GPUs with compute capability 5.0 (Maxwell architecture) and higher.

If you can spare a bit over 150 USD, try to get a RTX 3060 with 12 GB or a RTX 2060 with 12 GB (the 2060 often have only 6 GB).
This should give you more memory and a generation supporting Tensor Core computations, which is helpful for good pytorch performance.

1 Like

I bought one of these: MSI RTX 2060 VENTUS OC Specs | TechPowerUp GPU Database

It says it has a cuda rating of 7.5.

To use the old hardware, I held Ubuntu back to an older version so that I could get cuda 11.4 installed. But now that I have a more modern card, can I run the latest versions of cuda?

Yes. CUDA 12.x supports GPUs with compute capability (CC) 5.0 or higher, which includes CC 7.5.

Hi revolt3d,
congratulations to your new card.
See e.g. here for compatibility: CUDA - Wikipedia

BTW The Grid K1 was a compute/capability=architecture 3.0 Kepler device with 4 separate GK107 chips (NVIDIA GRID K1 Specs | TechPowerUp GPU Database), so each GPU could have only accessed 4 GB of the 16 GB memory each. You would have had problems with larger PyTorch models regardless of the architecture version.

The newest CUDA SDK (12.5) supports devices from 5.0 upwards. So you should be able to run the newest version for the next few years, probably at least including a future 14.X. Turing is also the first consumer GPU with the general (super-)architecture introduced with Volta 7.0 and kept ever since. So it is most likely to be supported for a long time by frameworks and SDKs as it is less effort compared to older generations like Kepler, Maxwell or Pascal. And it has Tensor Cores, although no support for sparse matrices, which are easily simulated with dense matrices and no (very new) 4, 6 or 8-bit floating-point types; the support for BF16 and TF32 also is very very unofficial with Turing (either experimental or a Nvidia business decision then for differentiating the generations) and typically not provided by the SDK and the frameworks, when selecting 7.5, only by the hardware; so you have to be careful about the used data types of neural networks to choose optimally accelerated ones. FP16 is well supported.

I just wanted to close the loop on this. My new GPU is working great. I’m turning text into speech. My advice to anyone doing this, spend the money to buy a good GPU. I got this 2060 with 12GB of memory for like $175 on ebay.

1 Like

Excellent. Now test it with these two sentences:

“How to recognize speech”
“How to wreck a nice beach”

Can you hear the difference?

2 Likes

I think it does it pretty good. That’s my voice.

External Media

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.