Text generation webui

I am trying to download TheBloke/Llama-2-70B-GGUF using the following command
./run.sh $(./autotag text-generation-webui)
This was working fine using download until it bombed out

I have these files :-

jetson-containers/data/models/text-generation-webui/TheBloke_Llama-2-70b-Chat-GGUF$ ls  -l
total 304196896
-rw-r--r-- 1 root root          29 Dec  2 16:25 config.json
-rw-r--r-- 1 root root        1508 Dec  5 19:32 huggingface-metadata.txt
-rw-r--r-- 1 root root        7020 Dec  2 16:25 LICENSE.txt
-rw-r--r-- 1 root root 29279253408 Dec  2 19:46 llama-2-70b-chat.Q2_K.gguf
-rw-r--r-- 1 root root 36147835808 Dec  4 21:00 llama-2-70b-chat.Q3_K_L.gguf
-rw-r--r-- 1 root root 33186657184 Dec  5 00:12 llama-2-70b-chat.Q3_K_M.gguf
-rw-r--r-- 1 root root 29919294368 Dec  5 20:48 llama-2-70b-chat.Q3_K_S.gguf
-rw-r--r-- 1 root root 17709331741 Dec  5 22:14 llama-2-70b-chat.Q4_0.gguf
-rw-r--r-- 1 root root  3659898967 Dec  5 02:52 llama-2-70b-chat.Q4_K_M.gguf
-rw-r--r-- 1 root root 13867898835 Dec  5 04:12 llama-2-70b-chat.Q4_K_S.gguf
-rw-r--r-- 1 root root  4100511455 Dec  5 04:36 llama-2-70b-chat.Q5_0.gguf
-rw-r--r-- 1 root root 48753767328 Dec  5 09:30 llama-2-70b-chat.Q5_K_M.gguf
-rw-r--r-- 1 root root 30574407680 Dec  5 12:28 llama-2-70b-chat.Q5_K_S.gguf
-rw-r--r-- 1 root root 36700160000 Dec  5 16:10 llama-2-70b-chat.Q6_K.gguf-split-a
-rw-r--r-- 1 root root 19887207328 Dec  5 18:05 llama-2-70b-chat.Q6_K.gguf-split-b
-rw-r--r-- 1 root root  7711227904 Dec  5 18:55 llama-2-70b-chat.Q8_0.gguf-split-a
-rw-r--r-- 1 root root       28111 Dec  2 16:25 README.md
-rw-r--r-- 1 root root        4766 Dec  2 16:25 USE_POLICY.md

This is where the confusions starts

you will notice I have directory :- TheBloke_Llama-2-70b-Chat-GGUF

shoult it be TheBloke_Llama-2-70B-Chat-GGUF (capital B or b)
In my confusion I have renamed this directory a couple of times but can’t get the download to continue

If I Use the small b in the download this is my reply …

Traceback (most recent call last):

File "/opt/text-generation-webui/modules/ui_model_menu.py", line 223, in download_model_wrapper

links, sha256, is_lora = downloader.get_download_links_from_huggingface(model, branch, text_only=False)

File "/opt/text-generation-webui/download-model.py", line 67, in get_download_links_from_huggingface

r.raise_for_status()

File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/TheBloke_Llama-2-70b-Chat-GGUF/tree/main


If I rename the directory using capital B

Traceback (most recent call last):

File "/opt/text-generation-webui/modules/ui_model_menu.py", line 223, in download_model_wrapper

links, sha256, is_lora = downloader.get_download_links_from_huggingface(model, branch, text_only=False)

File "/opt/text-generation-webui/download-model.py", line 67, in get_download_links_from_huggingface

r.raise_for_status()

File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/TheBloke_Llama-2-70B-Chat-GGUF/tree/main

Can sombody help me out of this HOLE-------------------------

@paulrrh it appears that you attempted to download that entire model repo including all the quants, which for 70B would take up a HEFTY amount of disk space for all of them, and I wouldn’t be surprised if you’re running low on storage. You only need to download the specific quant(s) that you want to use.

Instead, I would just move llama-2-70b-chat.Q4_K_M.gguf (or your desired quantized model) into jetson-containers/data/models/text-generation-webui and remove the rest:

cd jetson-containers/data/models/text-generation-webui
mv TheBloke_Llama-2-70b-Chat-GGUF/llama-2-70b-chat.Q4_K_M.gguf .
rm -rf TheBloke_Llama-2-70b-Chat-GGUF

In the future with GGUF models, you can use the --specific-file option to oobabooga’s download-model.py script to download just the quant you want, or just use wget and the huggingface URL:

cd jetson-containers/data/models/text-generation-webui
wget https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGUF/resolve/main/llama-2-70b-chat.Q4_K_M.gguf
1 Like

Thanks for your help yet again.

I’ll take your advice.
Because I have such a slow internet I thought wrongly I could download in one go.
I will give your option a try

Thanks

OK, good luck Paul! - note that what I said above was specific to llama.cpp GGUF models (those are self-contained and you only need one .gguf to run them). If you use other models like AutoGPTQ, Exllama, Transformers, ect - those you would typically download the whole repo for (but those also typically don’t include many quantization variants within them, so their model repo size is smaller)

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.