Problems with "Tutorial - text-generation-webui"

newbie_AI_explorer · January 19, 2025, 5:44pm

Today is my first day trying this Orin Nano Super.
I was succesful in setting up “olama-server”
And I can ssh into the device.
(The above are not related to the issue below:)

Then I wanted to try the “text-generation-webui”
The guide says: jetson-containers run $(autotag text-generation-webui)
but this will start build process that fails. (tried twice)

Then I pulled a container with: $ docker pull dustynv/text-generation-webui:r35.4.1
and start the server with: jetson-containers$ ./run.sh dustynv/text-generation-webui:r35.4.1

So far so good. I can access the website!
I downloaded few models, but none of them work.

Even the one from the tutorial video fails:
Model: TheBloke_Llama-2-7B-GPTQ
Model_Loader: ExLlamav2_HF

Traceback (most recent call last):
File “/opt/text-generation-webui/modules/ui_model_menu.py”, line 213, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File “/opt/text-generation-webui/modules/models.py”, line 87, in load_model
output = load_func_maploader
File “/opt/text-generation-webui/modules/models.py”, line 389, in ExLlamav2_HF_loader
return Exllamav2HF.from_pretrained(model_name)
File “/opt/text-generation-webui/modules/exllamav2_hf.py”, line 170, in from_pretrained
return Exllamav2HF(config)
File “/opt/text-generation-webui/modules/exllamav2_hf.py”, line 44, in init
self.ex_model.load(split)
File “/usr/local/lib/python3.8/dist-packages/exllamav2/model.py”, line 248, in load
for item in f: return item
File “/usr/local/lib/python3.8/dist-packages/exllamav2/model.py”, line 266, in load_gen
module.load()
File “/usr/local/lib/python3.8/dist-packages/exllamav2/attn.py”, line 188, in load
self.input_layernorm.load()
File “/usr/local/lib/python3.8/dist-packages/exllamav2/rmsnorm.py”, line 24, in load
w = self.load_weight()
File “/usr/local/lib/python3.8/dist-packages/exllamav2/module.py”, line 116, in load_weight
tensors = self.load_multi([“weight”], override_key = override_key)
File “/usr/local/lib/python3.8/dist-packages/exllamav2/module.py”, line 77, in load_multi
tensors[k] = stfile.get_tensor(key + “.” + k, device = self.device())
File “/usr/local/lib/python3.8/dist-packages/exllamav2/fasttensors.py”, line 118, in get_tensor
return f.get_tensor(key)
File “/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py”, line 255, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from Download The Latest Official NVIDIA Drivers

I have some Linux experience, but no knowledge on containers or python.
Hopefully you can help, thanks.

AastaLLL · January 20, 2025, 5:13am

Hi,

As mentioned in the below link:

Please run the memory optimization first and test the 7B models with 4-bit quantization.
Thanks.

newbie_AI_explorer · January 20, 2025, 12:38pm

I did the optimizations:

               total        used        free      shared  buff/cache   available
Mem:           7.4Gi       503Mi       6.4Gi        26Mi       581Mi       6.7Gi
Swap:          8.0Gi          0B       8.0Gi

Then I tried with these:

TheBloke/Llama-2-7b-Chat-GGUF
model: llama-2-7b-chat.Q4_K_M.gguf
model-loader: llama.cpp
n-gpu-layers = 128

And got error:

20:31:52-888982 INFO     Loading llama-2-7b-chat.Q4_K_M.gguf
20:31:52-983571 INFO     llama.cpp weights detected: /data/models/text-generation-webui/llama-2-7b-chat.Q4_K_M.gguf
20:31:52-986285 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "/opt/text-generation-webui/modules/ui_model_menu.py", line 213, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
  File "/opt/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
  File "/opt/text-generation-webui/modules/models.py", line 250, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
  File "/opt/text-generation-webui/modules/llamacpp_model.py", line 63, in from_pretrained
    Llama = llama_cpp_lib().Llama
AttributeError: 'NoneType' object has no attribute 'Llama'

AastaLLL · January 23, 2025, 5:30am

Hi,

Sorry for the missing.

It looks like you are using super mode so the environment should be r36.4.2 or r36.4.3.
As there are some dependencies between the GPU driver and CUDA-related libraries, please use the container built for r36.4.x instead.
dustynv/text-generation-webui:r35.4.1 might have some unexpected issues when running on the r36 environment.

You can build one with jetson-container directly:

$ jetson-containers run $(autotag text-generation-webui)
Namespace(packages=['text-generation-webui'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.4.3  JETPACK_VERSION=6.2  CUDA_VERSION=12.6
-- Finding compatible container image for ['text-generation-webui']

Couldn't find a compatible container for text-generation-webui, would you like to build it? [y/N] y
-- Building containers  ['build-essential', 'pip_cache:cu126', 'cuda:12.6', 'cudnn', 'python', 'numpy', 'cmake', 'onnx', 'pytorch:2.5', 'torchvision', 'huggingface_hub', 'rust', 'transformers', 'auto_gptq', 'flash-attention', 'exllama', 'llama_cpp', 'triton', 'auto_awq', 'text-generation-webui']
-- Building container text-generation-webui:r36.4.3-build-essential
...

Thanks.

newbie_AI_explorer · January 24, 2025, 5:31pm

That fixed it! Thanks!

There are a lot of new abbreviations and concepts. What is the best way to go forward from here?
I tried loading some other models from huggingface, but they all failed (even the smallest 1&2b models).

AastaLLL · January 28, 2025, 4:43pm

Hi,

It’s recommended to follow our tutorial first:

The Llama 7B model should work on Orin Nano 8GB.
But please remember to run the memory optimization first.

Thanks.

system · February 24, 2025, 7:35am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson Generative AI Playground - Tutorial 1 - Text Generation Jetson Orin Nano generative_ai	5	1006	September 27, 2023
Text-generation-webui no webpage Jetson Orin Nano generative_ai	13	146	November 7, 2025
Jetson Container text-generation-webui not loading models Jetson AGX Orin generative_ai	4	603	June 5, 2024
No compatible text-generation-webui Jetson Orin Nano cublas , generative_ai , llama	4	181	June 10, 2025
Text-generation-webui revisited Jetson Orin Nano generative_ai	5	149	June 4, 2025
Starting text-generation-webui from jetson-containers on Jetson Orin Nano throws error Jetson Orin Nano docker , containers , generative_ai	4	856	March 19, 2024
Couldn't find a compatible container for text-generation-webui Jetson AGX Orin containers , generative_ai	11	394	January 23, 2025
Text-generation-webu build errors Jetson Orin Nano generative_ai	8	129	September 15, 2025
Issue with launching web server: Tutorial - text-generation-webui Jetson Nano generative_ai	8	255	April 28, 2025
Jetson Orin Nano restarts while running building text-generation-webui (Loading exllamav2_ext extension (JIT)...) Jetson Orin Nano containers , generative_ai	10	209	March 12, 2025

Problems with "Tutorial - text-generation-webui"

Related topics