I want to try LLaVa with Jetson Orin

Heartful-echo · February 29, 2024, 1:48am

Hi

I want to try LLaVa with Jetson Orin.
I tried referring to the following page.

I got LLaVa from github, got the Docker image for LLaVa, and started the container. I mounted LLaVa’s directory.

git clone https://github.com/haotian-liu/LLaVA.git
sudo docker pull dustynv/llava:r35.3.1
sudo docker run -it --runtime nvidia --network host --name llava -v /home/jetson/Desktop/work/LLaVA:/LLaVA dustynv/llava:r35.3.1 bash

I’ve done a lot of research on what to do next, but I can’t figure it out.
Someone please help me.

dusty_nv · February 29, 2024, 2:07am

Hi @Heartful-echo, you don’t need to separatlely clone/mount the llava github repo, it’s already installed in the container and the source is under /opt/llava. You would basically then just follow the llava commands from their github, like are shown on my readme page there.

In reality, I don’t really recommend running the original Llava codebase, because it’s unquantized and slow for inference. There are more comprehensive and performant Llava examples to run on this tutorial page that use quantization:

https://www.jetson-ai-lab.com/tutorial_llava.html

Heartful-echo · February 29, 2024, 3:22am

@dusty_nv

Thank you for your advice.

I ran Optimized Multimodal Pipeline with local_llm
Then the following error was output.
Please teach me how to deal with it.

I have an ID for Hugging Face. However, I don’t know where to set this step…

echo@ubuntu:~/Desktop/work/jetson-containers$ ./run.sh $(./autotag local_llm) python3 -m local_llm --api=mlc --model liuhaotian/llava-v1.5-13b
Namespace(disable=[''], output='/tmp/autotag', packages=['local_llm'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.3.1  JETPACK_VERSION=5.1.1  CUDA_VERSION=11.4.315
-- Finding compatible container image for ['local_llm']

Found compatible container dustynv/local_llm:r35.3.1 (2024-02-22, 8.8GB) - would you like to pull it? [Y/n] y
dustynv/local_llm:r35.3.1
+ sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /home/echo/Desktop/work/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb dustynv/local_llm:r35.3.1 python3 -m local_llm --api=mlc --model liuhaotian/llava-v1.5-13b
Unable to find image 'dustynv/local_llm:r35.3.1' locally
r35.3.1: Pulling from dustynv/local_llm
Digest: sha256:b4de3266c45d2e4c69d122c91502dde0a185810711803bddc0b6c048c828a6f1
Status: Downloaded newer image for dustynv/local_llm:r35.3.1
.gitattribuecho: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<00:00, 636kB/s]
generation_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 154/154 [00:00<00:00, 67.7kB/s]
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.16k/1.16k [00:00<00:00, 73.0kB/s]
README.md: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.36k/1.36k [00:00<00:00, 656kB/s]
pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33.7k/33.7k [00:00<00:00, 2.48MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 438/438 [00:00<00:00, 435kB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 749/749 [00:00<00:00, 812kB/s]
tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 888kB/s]
mm_projector.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 62.9M/62.9M [00:51<00:00, 1.23MB/s]
pytorch_model-00003-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.24G/6.24G [43:57<00:00, 2.37MB/s]
pytorch_model-00002-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.90G/9.90G [46:32<00:00, 3.55MB/s]
pytorch_model-00001-of-00003.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [54:23<00:00, 3.05MB/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [54:24<00:00, 272.06s/it]
05:01:43 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.5-13b/snapshots/d64eb781be6876a5facc160ab1899281f59ef684 with MLC    | 6.29G/9.95G [46:30<25:06, 2.43MB/s]
globbing  /data/models/mlc/dist/models/llava-v1.5-13b/*.safetensors██████████████████████████████████████████████████████████████████████████████████| 9.95G/9.95G [54:23<00:00, 8.19MB/s]
glob  []
05:01:44 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/llava-v1.5-13b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 4096 --artifact-path /data/models/mlc/dist 


Using path "/data/models/mlc/dist/models/llava-v1.5-13b" for model "llava-v1.5-13b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param:   0%|                                                                                                                                         | 0/245 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while.                                                                                              | 0/407 [00:00<?, ?tensors/s]
Get old param:  99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 242/245 [02:03<00:01,  2.19tensors/sFinish computing and quantizing weights.███████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 406/407 [02:03<00:00,  7.24tensors/s]
Total param size: 6.085580825805664 GB
Start storing to cache /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/params
[0407/0407] saving param_406
All finished, 143 total shards committed, record saved to /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/params/ndarray-cache.json██████████████████| 407/407 [02:20<00:00,  7.24tensors/s]
Attempting to convert `tokenizer.model` to `tokenizer.json`.
Succesfully converted `tokenizer.model` to: /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/params/tokenizer.json
Finish exporting chat config to /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/params/mlc-chat-config.json

Save a cached module to /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/mod_cache_before_build.pkl.
Finish exporting to /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/llava-v1.5-13b-q4f16_ft-cuda.so
05:06:57 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=16, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None
05:06:57 | INFO | loading llava-v1.5-13b from /data/models/mlc/dist/llava-v1.5-13b-q4f16_ft/llava-v1.5-13b-q4f16_ft-cuda.so
05:07:04 | INFO | loading openai/clip-vit-large-patch14-336
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbyecho, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 719, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 400, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbyecho, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 535, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 648, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_http.py", line 67, in send
    return super().send(request, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 6d08347a-59cb-44f3-bc49-1fb01ff3a0f6)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 430, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py", line 1371, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/__main__.py", line 22, in <module>
    model = LocalLM.from_pretrained(
  File "/opt/local_llm/local_llm/local_llm.py", line 80, in from_pretrained
    model.init_vision()  
  File "/opt/local_llm/local_llm/local_llm.py", line 181, in init_vision
    self.vision = CLIPImageEmbedding.from_pretrained(
  File "/opt/local_llm/local_llm/vision/clip_hf.py", line 24, in from_pretrained
    inst = CLIPImageEmbedding(model, dtype=dtype, **kwargs)
  File "/opt/local_llm/local_llm/vision/clip_hf.py", line 42, in __init__
    self.preprocessor = CLIPImageProcessor.from_pretrained(model, torch_dtype=self.dtype)#.to(self.device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/image_processing_utils.py", line 203, in from_pretrained
    image_processor_dict, kwargs = cls.get_image_processor_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/image_processing_utils.py", line 332, in get_image_processor_dict
    resolved_image_processor_file = cached_file(
  File "/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py", line 470, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like openai/clip-vit-large-patch14-336 is not the path to a directory containing a file named preprocessor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
echo@ubuntu:~/Desktop/work/jetson-containers$

dusty_nv · March 10, 2024, 3:36am

Sorry for the delay @Heartful-echo, if you still need to put in your HuggingFace token, use --env HUGGINGFACE_TOKEN like this:

./run.sh --env HUGGINGFACE_TOKEN=abc123def456ghi789 $(./autotag local_llm) python3 -m local_llm --api=mlc --model liuhaotian/llava-v1.5-13b

Heartful-echo · March 10, 2024, 11:33pm

@dusty_nv

Thank you.
I set the token in the environment variable and confirmed that it works.

system · March 24, 2024, 11:34pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot run LLaVa with Orin NX Jetson Orin NX generative_ai	7	489	August 1, 2024
Live Llava on Orin Jetson Projects generative_ai	20	2699	March 13, 2025
LLaMa 2 LLMs w/ NVIDIA Jetson and textgeneration-web-ui Jetson Projects generative_ai	86	26240	May 10, 2024
Run from the terminal with llava.serve.cli on Nvidia Jetson AGX Orin Jetson AGX Orin camera , generative_ai	5	467	March 21, 2024
Running LLAVA live on Jetson orin nx(16 GB) with nvidia jetpack 5.1.1 Jetson Orin NX generative_ai	4	935	March 21, 2024
How to run the default LLaVA demo? Jetson AGX Orin ai-workbench , generative_ai	4	992	November 27, 2023
Can't loading "TheBloke_llava-v1.5-13B-GPTQ" with AGXorin 32GB Jetson AGX Orin generative_ai	9	295	September 10, 2024
Can't start the live llava on jetson orin nano developer kit Jetson Orin Nano generative_ai	9	1019	June 4, 2024
NVIDI LLaVA VILA models Jetson Orin Nano generative_ai	4	168	June 25, 2025
Jetson Orin Nano with LLaVA 1.5 Model Jetson Orin Nano cuda , ubuntu , generative_ai	2	150	June 17, 2025

I want to try LLaVa with Jetson Orin

Related topics