Cannot run LLaVa with Orin NX

michtw · July 20, 2024, 2:55pm

I cannot run LLaVa with L4T 35.5 on Jetson Orin NX 16G.

I tried referring to the following page.
https://www.jetson-ai-lab.com/tutorial_llava.html#1-chat-with-llava-using-text-generation-webui

I got the error below but I cannot figure out what means of this.

ERROR The model could not be loaded because its checkpoint file in .bin/.pt/.safetensors format could not be located.

Below is the detail of the error messages.

– Finding compatible container image for [‘text-generation-webui’]
[sudo] password for a:
dustynv/text-generation-webui:r35.4.1-cp310

sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc
/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run
/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.so
ck --volume /home/a/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --devi
ce /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 -
-workdir=/opt/text-generation-webui dustynv/text-generation-webui:r35.4.1-cp310 python3 server.py --listen --model-dir /data/models/tex
t-generation-webui --model TheBloke_llava-v1.5-13B-GPTQ --multimodal-pipeline llava-v1.5-13b --loader autogptq --disable_exllama --verb
ose
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will
be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
10:00:36-198813 INFO Starting Text generation web UI
10:00:36-204703 WARNING
You are potentially exposing the web UI to the entire internet without any access password.
You can create one with the “–gradio-auth” flag like this:
```
                  --gradio-auth username:password                                                                                
                                                                                                                                 
                  Make sure to replace username:password with your own.                                                          
```

10:00:36-211187 INFO Loading settings from “settings.yaml”
10:00:36-216890 INFO Loading “TheBloke_llava-v1.5-13B-GPTQ”
10:00:36-268205 ERROR The model could not be loaded because its checkpoint file in .bin/.pt/.safetensors format could not be located.
10:00:36-271061 INFO Loading the extension “multimodal”
10:00:38-239843 INFO LLaVA - Loading CLIP from openai/clip-vit-large-patch14-336 as torch.float16 on cuda:0…
preprocessor_config.json: 100%|████████████████████████████████████████████████████████████████████████| 316/316 [00:00<00:00, 699kB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 4.76k/4.76k [00:00<00:00, 7.46MB/s]
pytorch_model.bin: 11%|████████▎

File “/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py”, line 261, in call_process_api
output = await app.get_blocks().process_api(
File “/usr/local/lib/python3.10/dist-packages/gradio/blocks.py”, line 1786, in process_api
result = await self.call_function(
File “/usr/local/lib/python3.10/dist-packages/gradio/blocks.py”, line 1350, in call_function
prediction = await utils.async_iteration(iterator)
File “/usr/local/lib/python3.10/dist-packages/gradio/utils.py”, line 583, in async_iteration
return await iterator.anext()
File “/usr/local/lib/python3.10/dist-packages/gradio/utils.py”, line 576, in anext
return await anyio.to_thread.run_sync(
File “/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py”, line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File “/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py”, line 2144, in run_sync_in_worker_thread
return await future
File “/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py”, line 851, in run
result = context.run(func, *args)
File “/usr/local/lib/python3.10/dist-packages/gradio/utils.py”, line 559, in run_sync_iterator_async
return next(iterator)
File “/usr/local/lib/python3.10/dist-packages/gradio/utils.py”, line 742, in gen_wrapper
response = next(iterator)
File “/opt/text-generation-webui/modules/chat.py”, line 414, in generate_chat_reply_wrapper
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
File “/opt/text-generation-webui/modules/chat.py”, line 382, in generate_chat_reply
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for
_ui):
File “/opt/text-generation-webui/modules/chat.py”, line 312, in chatbot_wrapper
raise ValueError(“No model is loaded! Select one in the Model tab.”)
ValueError: No model is loaded! Select one in the Model tab.

dusty_nv · July 20, 2024, 3:52pm

Hi @michtw, can you double-check that the model weights were downloaded under your jetson-containers/data/models/text-generation-webui directory? You might want to try running this step again to confirm the integrity of the downloads: https://www.jetson-ai-lab.com/tutorial_llava.html#download-model

If you continue having issue with Llava in text-generation-webui (which does not have particularly stable or optimized multimodal pipeline), I would recommend moving onto this part of the tutorial: https://www.jetson-ai-lab.com/tutorial_nano-vlm.html#multimodal-chat

michtw · July 21, 2024, 2:27am

I had run the download model process twice but still cannot run the Llava.
Because I am using L4T 35.5(JetPack 5), do I need to checkout jp5 git branch?

Multimodal Chat needs JetPack 6 to run. Does it support JetPack 5?

Thanks.

michtw · July 22, 2024, 1:26am

Hi Dusty,

It seems that the text-generation-webui model has been downloaded.

$ ls jetson-containers/data/models/text-generation-webui/ -lh
total 6.8G
-rw-r--r-- 1 root root 1.9K  七  19 17:43 config.json
-rw-r--r-- 1 root root  154  七  19 17:43 generation_config.json
-rw-r--r-- 1 root root  288  七  22 09:16 huggingface-metadata.txt
-rw-r--r-- 1 root root 6.9K  七  19 17:43 LICENSE.txt
-rw-r--r-- 1 root root 6.8G  七  19 18:00 model.safetensors
-rw-r--r-- 1 root root  134  七  19 17:43 quantize_config.json
-rw-r--r-- 1 root root  20K  七  19 17:43 README.md
-rw-r--r-- 1 root root  438  七  19 17:43 special_tokens_map.json
-rw-r--r-- 1 root root  748  七  19 17:43 tokenizer_config.json
-rw-r--r-- 1 root root 489K  七  19 17:43 tokenizer.model
-rw-r--r-- 1 root root 4.7K  七  19 17:43 USE_POLICY.md

But there are some errors when I run the text-generation-webui

01:19:15-122606 INFO Loading settings from “settings.yaml”
01:19:15-129839 INFO Loading “TheBloke_llava-v1.5-13B-GPTQ”
01:19:15-197925 ERROR The model could not be loaded because its checkpoint file in .bin/.pt/.safetensors format could not be
located.

$ sudo jetson-containers run --workdir=/opt/text-generation-webui $(autotag text-generation-webui) \
>   python3 server.py --listen \
>     --model-dir /data/models/text-generation-webui \
>     --model TheBloke_llava-v1.5-13B-GPTQ \
>     --multimodal-pipeline llava-v1.5-13b \
>     --loader autogptq \
>     --disable_exllama \
>     --verbose
Namespace(disable=[''], output='/tmp/autotag', packages=['text-generation-webui'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.5.0  JETPACK_VERSION=5.1  CUDA_VERSION=11.4
-- Finding compatible container image for ['text-generation-webui']
dustynv/text-generation-webui:r35.4.1-cp310
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/a/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --workdir=/opt/text-generation-webui dustynv/text-generation-webui:r35.4.1-cp310 python3 server.py --listen --model-dir /data/models/text-generation-webui --model TheBloke_llava-v1.5-13B-GPTQ --multimodal-pipeline llava-v1.5-13b --loader autogptq --disable_exllama --verbose
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
01:19:15-114302 INFO     Starting Text generation web UI                                                                               
01:19:15-120036 WARNING                                                                                                                
                         You are potentially exposing the web UI to the entire internet without any access password.                   
                         You can create one with the "--gradio-auth" flag like this:                                                   
                                                                                                                                       
                         --gradio-auth username:password                                                                               
                                                                                                                                       
                         Make sure to replace username:password with your own.                                                         
01:19:15-122606 INFO     Loading settings from "settings.yaml"                                                                         
01:19:15-129839 INFO     Loading "TheBloke_llava-v1.5-13B-GPTQ"                                                                        
01:19:15-197925 ERROR    The model could not be loaded because its checkpoint file in .bin/.pt/.safetensors format could not be        
                         located.                                                                                                      
01:19:15-200133 INFO     Loading the extension "multimodal"                                                                            
01:19:17-150736 INFO     LLaVA - Loading CLIP from openai/clip-vit-large-patch14-336 as torch.float16 on cuda:0...                     
01:19:20-707700 INFO     LLaVA - Loading projector from liuhaotian/llava-v1.5-13b as torch.float16 on cuda:0...                        
01:19:21-237836 INFO     LLaVA supporting models loaded, took 4.09 seconds                                                             
01:19:21-240776 INFO     Multimodal: loaded pipeline llava-v1.5-13b from pipelines/llava (LLaVA_v1_5_13B_Pipeline)                     

Running on local URL:  http://0.0.0.0:7860

dusty_nv · July 24, 2024, 3:34am

Hi @michtw , I recall in the oobabooga webui, there is a “use safetensors” option in the model loader section, so it may need a similar flag if done from the command line. You do not need jp5 git branch of jetson-containers anymore, that was from long ago. Also the Multimodal Chat could still working with JetPack 5, I had recently rebuild the container for it on L4T R35.2.1.

michtw · July 27, 2024, 9:10am

Hi Dusty,

I cannot load the multimodal vision/language model with L4T 35.5.0, it ends up with errors.

ImportError: whisper_trt not installed (minimum BSP version JetPack 6 / L4T R36)

08:46:30 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-7b/snapshots/e87d80e4a90f70885036feeaf37fa1bc62048201 with MLC
08:46:30 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
08:46:32 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=918000, multiprocessors=4, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None
08:46:33 | INFO | loading VILA-7b from /data/models/mlc/dist/VILA-7b-ctx4096/VILA-7b-q4f16_ft/VILA-7b-q4f16_ft-cuda.so
08:46:34 | WARNING | model library /data/models/mlc/dist/VILA-7b-ctx4096/VILA-7b-q4f16_ft/VILA-7b-q4f16_ft-cuda.so was missing metadata
08:46:38 | INFO | loading clip vision model openai/clip-vit-large-patch14-336
08:46:43 | WARNING | disabling CLIP with TensorRT 8.5.2.2 (requires TensorRT 8.6 or newer)
08:46:43 | SUCCESS | loaded clip vision model openai/clip-vit-large-patch14-336
mm_projector (mlp2x_gelu) Sequential(
  (0): Linear(in_features=1024, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear(in_features=4096, out_features=4096, bias=True)
)
mm_projector weights dict_keys(['0.bias', '0.weight', '2.bias', '2.weight'])
┌────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│ _name_or_path              │ ../../checkpoints/llama-2-7b-mmc4-coyo-paper_reproduce                      │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ architectures              │ ['LlavaLlamaForCausalLM']                                                   │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_bias             │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ attention_dropout          │ 0.0                                                                         │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ bos_token_id               │ 1                                                                           │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ eos_token_id               │ 2                                                                           │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ freeze_mm_mlp_adapter      │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_act                 │ silu                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ hidden_size                │ 4096                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ image_aspect_ratio         │ pad                                                                         │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ initializer_range          │ 0.02                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ intermediate_size          │ 11008                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_position_embeddings    │ 4096                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_hidden_size             │ 1024                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_lr            │                                                                             │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_type          │ mlp2x_gelu                                                                  │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_use_im_patch_token      │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_use_im_start_end        │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_vision_select_feature   │ patch                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_vision_select_layer     │ -2                                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_vision_tower            │ openai/clip-vit-large-patch14-336                                           │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ model_type                 │ llama                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_attention_heads        │ 32                                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_hidden_layers          │ 32                                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ num_key_value_heads        │ 32                                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ pad_token_id               │ 0                                                                           │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ pretraining_tp             │ 1                                                                           │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rms_norm_eps               │ 1e-05                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_scaling               │                                                                             │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ rope_theta                 │ 10000.0                                                                     │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings        │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tokenizer_model_max_length │ 4096                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tokenizer_padding_side     │ right                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ torch_dtype                │ bfloat16                                                                    │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ transformers_version       │ 4.36.2                                                                      │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ tune_mm_mlp_adapter        │ False                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ use_cache                  │ True                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ use_mm_proj                │ True                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ vocab_size                 │ 32000                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ name                       │ VILA-7b                                                                     │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ api                        │ mlc                                                                         │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ mm_projector_path          │ /data/models/huggingface/models--Efficient-Large-Model--VILA-7b/snapshots/e │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ quant                      │ q4f16_ft                                                                    │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ type                       │ llama                                                                       │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ max_length                 │ 4096                                                                        │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size         │ -1                                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ load_time                  │ 14.259775633999197                                                          │
├────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ params_size                │ 3232.7265625                                                                │
└────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

08:46:45 | INFO | using chat template 'vicuna-v1' for model VILA-7b
08:46:45 | INFO | model 'VILA-7b', chat template 'vicuna-v1' stop tokens:  ['</s>'] -> [2]
08:46:45 | INFO | Warming up LLM with query 'What is 2+2?'
08:46:46 | INFO | Warmup response:  '4</s>'
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 327, in <module>
    agent = WebChat(**vars(args))
  File "/opt/NanoLLM/nano_llm/agents/web_chat.py", line 32, in __init__
    super().__init__(**kwargs)
  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 42, in __init__
    self.vad = VADFilter(**kwargs).add(self.asr) if self.asr else None
  File "/opt/NanoLLM/nano_llm/plugins/speech/vad_filter.py", line 49, in __init__
    raise ImportError("whisper_trt not installed (minimum BSP version JetPack 6 / L4T R36)")
ImportError: whisper_trt not installed (minimum BSP version JetPack 6 / L4T R36)

dusty_nv · August 1, 2024, 4:23am

michtw:

  File "/opt/NanoLLM/nano_llm/agents/voice_chat.py", line 42, in __init__
    self.vad = VADFilter(**kwargs).add(self.asr) if self.asr else None
  File "/opt/NanoLLM/nano_llm/plugins/speech/vad_filter.py", line 49, in __init__
    raise ImportError("whisper_trt not installed (minimum BSP version JetPack 6 / L4T R36)")
ImportError: whisper_trt not installed (minimum BSP version JetPack 6 / L4T R36)

OK sorry about that, I had added Whisper support which needs JP6 and although that part is optional, this VADFilter comes with it and I hadn’t switched that off for JP5. For the time being, I would try either disabling that in the code (you can more easily do that by mounting an external clone of NanoLLM repo into the container like shown here), or fall back to using the CLI or Python interface. Also the VideoQuery agent does not have VAD/ASR, and in Agent Studio you can visually configure the pipeline how you want.

system · August 28, 2024, 6:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
I want to try LLaVa with Jetson Orin Jetson AGX Orin generative_ai	5	985	March 10, 2024
Live Llava on Orin Jetson Projects generative_ai	20	2278	March 13, 2025
Can't loading "TheBloke_llava-v1.5-13B-GPTQ" with AGXorin 32GB Jetson AGX Orin generative_ai	9	171	September 10, 2024
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	24523	May 10, 2024
Running LLAVA live on Jetson orin nx(16 GB) with nvidia jetpack 5.1.1 Jetson Orin NX generative_ai	4	843	March 21, 2024
Jetson AGX Orin: LLaVa Tutorial failed in building text-generation-webui-torchvision Jetson AGX Orin jetson-orin	3	45	May 29, 2025
How to run the default LLaVA demo? Jetson AGX Orin ai-workbench , generative_ai	4	900	November 27, 2023
Chat with Llava fails Jetson AGX Xavier generative_ai	3	39	March 6, 2025
WARNING \| AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize Jetson Orin NX generative_ai	4	153	March 20, 2025
Can't start the live llava on jetson orin nano developer kit Jetson Orin Nano generative_ai	9	841	June 4, 2024

Cannot run LLaVa with Orin NX

Related topics