Disk space not free even after removing the project from AI workbench

If you are reporting a bug or error, consider submitting a Support Bundle to aiworkbench-ea@nvidia.com. This will help us solve your issue more quickly.

Please describe your issue or request: (tick the boxes after creating this topic):

Please tick the appropriate box to help us categorize your post
[*] Bug or Error
Feature Request
Documentation Issue
Other

Hi Team,
I was trying out the example-hybrid-rag example of AI workbench . Firstly the build was successful, however when I tried to open the chat it gave me a connection failure error. I tried to fetch the logs and I did not find the necessary logs to debug the issue.

Secondly I tried to clear delete the project , however even after the deleting the project the disk space did not free because of which I do not have sufficient space in my c drive. It took around 80 GB this activity but I’m not able to free up the space because I’m not sure why this project is occupying my disk space even when I have deleted the project, I’m pretty sure the I had at least 80 GB before building this project. Would appreciate if this issue is fixed on priority.

Thanks and Regards,
Raghu

You need to delete the container image, which is typically quite large.

I believe you are on Windows.

If you go into the WSL distro, NVIDIA-Workbench and run the command docker images you will see the different images you have and their sizes.

Does that account for the size on disk?

Yes you are right, I’m on windows. I don’t find WSL distro from my command prompt . I only see WSL…
wsl_command

sorry.
open the terminal from a windows shell with:
wsl -d NVIDIA-Workbench

Then run:
docker images

Hi,

When you pull a container image to your machine, those image layers get cached on your system, which can take up space. Further, when you download large files like model weights from hugging face inside those containers, those resources may get cached as well. This can take up several gigabytes of space.

When you delete a project, the build cache typically remains such that if you were to re-pull a project, you do not necessarily need to undergo another build process. However, if you would like to delete the contents of the cache, you can try to run the following:

docker system prune -a. This also should work with podman.

This should clear out the cache of all unused images and resources on your system and typically can free up a good amount of space. Hope this helps!

Hi,
I ran the command that you have mentioned , it says …

5mwmz5h3uvmvn0wz84335cetp
tinor552hil7pzi7wn4ke3nl8
z1qtgj2h3l7j0n0dp27wufz99
bj7hafflwh6socifa43qh2h1b

Total reclaimed space: 18.31GB

But I don’t see the disk space being cleared…

Br,
Raghu

Ah I see, so from this thread,

WSL2 works like a virtual machine; the guest OS doesn’t directly store individual files in your host filesystem – it has a dedicated virtual “disk” that is stored in a single large .vhdx file as seen by Windows.

This file grows dynamically as the guest OS writes data to new “disk” areas, but does not shrink when those areas are no longer used. For example, if the virtual disk had sectors 10000-20000 in use by a large file and you removed it, the space for those sectors (and their data, too) is still consumed by the VHDX file.

So over time, as Ubuntu places new files all over the disk area, the VHDX will continue growing until it eventually reaches the complete “disk size” that was given to the VM.

There are some other threads you can find online to optimize your virtual disk used by WSL2, such as optimize-vhd. @bfurtaw can help you locate the virtual disk used by the NVIDIA-Workbench WSL distro.

Oh ok, I tried to go through the threads but not so clear as to what the solution is. Is it possible to maybe have a small debug call and clarify it / get it fixed because I want to create a new project and I might need more disk space… Also I want to know how to get the disk space back to normal when I no longer need the project…

For sure. This is something we want to figure out as well.

Please email aiworkbench-ea@nvidia.com to set it up.

What is the size of the ext4.vhdx file in C:\ProgramData\NVIDIA Corporation\workbench ?

One of the good things about AI Workbench is that it takes care of a lot of the admin related work for you. For example, you don’t have to manage or worry about WSL.

The bad news is that having everything abstracted away also means you’re kind of limited to solutions through the interface, and as has been pointed out, in this case the limitation is really with how WSL is greedy when it comes to disk space.

There are a lot of good reasons for this, but of course it is in not ideal when you run into issues. Not only do you have to go down this rabbit hole, but because you’re embarking on this adventure you’re also going to encounter the pains related to all of the overhead that was so nicely abstracted away. To say it another way, the more you chase this down the more likely the problem is to become larger and much more challenging.

As a general approach I’d recommend using your local computer as a orchestrater and running all of your workloads on remote instances. If you’re not using remote instances, the overhead of workbench is high and the benefit is low. That is more true on Windows. I have a tiny little laptop so I’m doing this with remote instances where $1 will allow me to run for hours. Aside from being easy and relatively cheap, I can also decide I want something more powerful and it only takes a few minutes to setup. I don’t have to worry about disk space or anything else about my local environment, and when I’m done I can just throw everything away (or commit it) and let AI workbench manage all the painful things on the next round.

I’ll be publishing a blog article about this for a related hackathon so I’ll provide those instructions once that goes live. Remind me if you don’t see an update by Thursday.

All that said, the WSL disk concern is a known issue that unfortunately has been outstanding for some time. Like the other link you might have some luck trying the things that have worked for others (but prepare for a likelihood of additional challenges with docker and/or workbench when trying these things):

1 Like

Hi - thanks a lot for this feedback. looking forward to the blog.

I think we actually found a fairly simple solution for this problem with this user. If that’s the case we can post it here.

Also, would be great to chat with you to help us scope how to open things up for people to trouble shoot these kinds of issues on their own more easily.

If you want to chat, just email us at aiworkbench-ea@nvidia.com

Tyler

Hi,
Do you mean AI workbench works better for GPUs located in remote locations by just spending 1$ rather than using the resources/location available locally?