Jetson Orin Nano Live LLAVA implementation

joshkimchi · October 23, 2025, 1:22am

Hi, I have recently continued on the implementation of LLAVA and since there were some issues, i decided to redo the installation.

i ran this inference first before going into the video query script:

python3 -m nano_llm.chat --api=mlc \ --model Efficient-Large-Model/VILA1.5-3b \ --max-context-len 256 \ --max-new-tokens 32

and it returned with this error:
NvMapMemAllocInternalTagged: 1075072515 error 12 NvMapMemHandleAlloc: error 0 NvMapMemAllocInternalTagged: 1075072515 error 12 NvMapMemHandleAlloc: error 0 NvMapMemAllocInternalTagged: 1075072515 error 12 NvMapMemHandleAlloc: error 0 NvMapMemAllocInternalTagged: 1075072515 error 12 NvMapMemHandleAlloc: error 0

…..

ValueError: Error when loading parameters from params_shard_36.bin: [01:12:25] /opt/mlc-llm/3rdparty/tvm/src/runtime/cuda/cuda_device_api.cc:138: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: out of memory

is there any help i can get on this? it was able to run on the first try but subsequently, this error popped up

AastaLLL · October 23, 2025, 3:30am

Could you check the memory status with command below first?

$ sudo tegrastats

Do you run the nano_llm command locally or within the container?
Thanks.

joshkimchi · October 23, 2025, 3:37am

Hello I just rebuilt it, and im running it in the container now. tegrastats showed no over usage of RAM

10-23-2025 09:13:56 RAM 3078/7620MB (lfb 17x4MB) SWAP 4/20194MB (cached 0MB) CPU [2%@729,0%@729,0%@729,1%@729,3%@729,2%@729] GR3D_FREQ 0% cpu@50.593C soc2@49.781C soc0@51.156C gpu@51.843C tj@51.843C soc1@51.593C VDD_IN 5277mW/5882mW VDD_CPU_GPU_CV 554mW/971mW VDD_SOC 1663mW/1709mW 10-23-2025 09:13:57 RAM 3074/7620MB (lfb 17x4MB) SWAP 4/20194MB (cached 0MB) CPU [1%@1344,4%@1344,3%@1344,1%@1344,7%@729,3%@729] GR3D_FREQ 0% cpu@50.656C soc2@49.781C soc0@51.031C gpu@51.875C tj@51.875C soc1@51.562C VDD_IN 5356mW/5868mW VDD_CPU_GPU_CV 633mW/962mW VDD_SOC 1663mW/1708mW

and container made using this:

jetson-containers run --name nano_llm1 \ --runtime nvidia --gpus all \ --shm-size=2g \ --env HUGGINGFACE_TOKEN=(my hf token) \ -v /ssd/models:/models \ -v /ssd/hf_cache:/root/.cache/huggingface \ $(autotag nano_llm)

please advice @AastaLLL , thank you

joshkimchi · October 23, 2025, 3:52am

@AastaLLL on running the video query script now,

i am faced with this error as well
Traceback (most recent call last): File “/opt/NanoLLM/nano_llm/plugin.py”, line 321, in run self.process_inputs(timeout=0.25) File “/opt/NanoLLM/nano_llm/plugin.py”, line 348, in process_inputs self.dispatch(input, **kwargs) File “/opt/NanoLLM/nano_llm/plugin.py”, line 361, in dispatch outputs = self.process(input, **kwargs) File “/opt/NanoLLM/nano_llm/plugins/chat_query.py”, line 152, in process self.process(x, **kwargs) File “/opt/NanoLLM/nano_llm/plugins/chat_query.py”, line 189, in process embedding, position = chat_history.embed_chat( File “/opt/NanoLLM/nano_llm/chat/history.py”, line 369, in embed_chat embeddings.append(msg.embed()) File “/opt/NanoLLM/nano_llm/chat/message.py”, line 215, in embed self._embed_image(self.history.model, split_template, return_tensors=return_tensors, **kwargs) File “/opt/NanoLLM/nano_llm/chat/message.py”, line 256, in _embed_image image_embeds = model.embed_image(self.content, return_tensors=return_tensors) File “/opt/NanoLLM/nano_llm/nano_llm.py”, line 278, in embed_image embedding = self.mm_projector(embedding) File “/opt/NanoLLM/nano_llm/vision/mm_projector.py”, line 139, in call return self.model(*args, **kwargs) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1562, in _call_impl return forward_call(*args, **kwargs) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py”, line 219, in forward input = module(input) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1562, in _call_impl return forward_call(*args, **kwargs) File “/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py”, line 117, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (196x768 and 1024x2560)

any help for this?

thanks alot

AastaLLL · October 27, 2025, 8:00am

Hi,

To check it further, could you try to run a CUDA sample to see if you can access the GPU within the container:

$ git clone -b v12.5 https://github.com/NVIDIA/cuda-samples.git
$ cd cuda-samples/Samples/0_Introduction/vectorAdd
$ make
$ ./vectorAdd

Thanks.

system · November 19, 2025, 3:07am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA out of memory Jetson Orin Nano cuda	6	288	November 6, 2025
I run demo live LLaVA but it had some problem, can anyone help me fix it? thankss Jetson Orin Nano generative_ai	2	179	June 28, 2024
Can't start the live llava on jetson orin nano developer kit Jetson Orin Nano generative_ai	9	988	June 4, 2024
Live LLaVA not work Jetson Orin Nano generative_ai	2	325	May 10, 2024
Jetson orin nano fail to quanization NanoVLM model Jetson Orin Nano generative_ai	3	222	July 30, 2024
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	189	January 13, 2025
NanoVLM Issue on Jetson Orin Nano Jetson Orin Nano generative_ai	9	860	June 6, 2024
VILA 1.5 3B on Jetson Orin Nano Jetson Orin Nano jetson-inference , inception , generative_ai	4	973	June 5, 2024
VILA 1.5-3b Model Jetson Orin Nano generative_ai	4	228	June 26, 2025
Ollama errors orin nano Jetson Orin NX generative_ai	29	1182	December 16, 2025

Jetson Orin Nano Live LLAVA implementation

Related topics