Orin Nano Qwen3-VL-4B

JimikoSK · October 18, 2025, 1:00am

Looking to run Qwen3-VL-4B with the Orin Nano.

Anyone get it running?

I created a new conda:

I installed torch, torchvision, torchaudio, from https://pypi.jetson-ai-lab.io/jp6/cu126

Installed latest transformers with pip install git+https://github.com/huggingface/transformers --index-url https://pypi.jetson-ai-lab.io/jp6/cu126

There’s an unsloth version that’s 4-bit Quantized, but I haven’t been able to get that working either.
unsloth/Qwen3-VL-4B-Instruct-unsloth-bnb-4bit · Hugging Face

Any ideas would be appreciated! ❤️

-- coding: utf-8 --

import torch
from qwen_vl_utils import process_vision_info
from transformers import AutoProcessor
from vllm import LLM, SamplingParams

import os
os.environ[‘VLLM_WORKER_MULTIPROC_METHOD’] = ‘spawn’

def prepare_inputs_for_vllm(messages, processor):
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# qwen_vl_utils 0.0.14+ reqired
image_inputs, video_inputs, video_kwargs = process_vision_info(
messages,
image_patch_size=processor.image_processor.patch_size,
return_video_kwargs=True,
return_video_metadata=True
)
print(f"video_kwargs: {video_kwargs}")

mm_data = {}
if image_inputs is not None:
    mm_data['image'] = image_inputs
if video_inputs is not None:
    mm_data['video'] = video_inputs

return {
    'prompt': text,
    'multi_modal_data': mm_data,
    'mm_processor_kwargs': video_kwargs
}

if name == ‘main’:
# messages = [
#     {
#         “role”: “user”,
#         “content”: [
#             {
#                 “type”: “video”,
#                 “video”: “https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-VL/space_woaudio.mp4”,
#             },
#             {“type”: “text”, “text”: “这段视频有多长”},
#         ],
#     }
# ]

messages = [
    {
        "role": "user",
        "content": [
          {
              "type": "image",
              "image": "https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-VL/receipt.png",
          },
          {"type": "text", "text": "Read all the text in the image."},
        ],
    }
]

# TODO: change to your own checkpoint path
checkpoint_path = "Qwen/Qwen3-VL-4B-Instruct-FP8"
processor = AutoProcessor.from_pretrained(checkpoint_path)
inputs = [prepare_inputs_for_vllm(message, processor) for message in [messages]]

llm = LLM(
    model=checkpoint_path,
    trust_remote_code=True,
    gpu_memory_utilization=0.70,
    enforce_eager=False,
    tensor_parallel_size=torch.cuda.device_count(),
    seed=0
)

sampling_params = SamplingParams(
    temperature=0,
    max_tokens=1024,
    top_k=-1,
    stop_token_ids=[],
)

for i, input_ in enumerate(inputs):
    print()
    print('=' * 40)
    print(f"Inputs[{i}]: {input_['prompt']=!r}")
print('\n' + '>' * 40)

outputs = llm.generate(inputs, sampling_params=sampling_params)
for i, output in enumerate(outputs):
    generated_text = output.outputs[0].text
    print()
    print('=' * 40)
    print(f"Generated text: {generated_text!r}")

AastaLLL · October 20, 2025, 2:32am

Hi,

Didn’t try Qwen3-VL-4B, but we have tested Qwen2.5-VL-3B on Orin Nano, which can work correctly.
Please note that you will need to apply the memory optimization mentioned in the link below:

Please find our detailed (container/parameter/comment) for running Qwen2.5-VL-3B in the link below:

Thanks.

JimikoSK · October 20, 2025, 3:20am

Thank you, I will poke around and see if I can get it running.

JimikoSK · October 30, 2025, 4:32am

Not 4B, but so far I managed to get 2B (FP16) running with Transfomers after a fresh SD Card Install. Still trying to get my SSD flashed, but this works in the meantime. Uses nearly all the RAM, but 4B should work with 4-bit Quants once I can get that working.

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch

default: Load the model on the available device(s)

model = Qwen3VLForConditionalGeneration.from_pretrained(
“Qwen/Qwen3-VL-2B-Instruct”, dtype=“auto”, device_map=“auto”
)

processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_>
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=>
)
print(output_text)

JimikoSK · October 30, 2025, 5:45am

Running the above Python script, I get this error, but the model still processes. Anything I should be concerned about or adjust?

NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0
NvMapMemAllocInternalTagged: 1075072515 error 12
NvMapMemHandleAlloc: error 0

dffffn123 · October 31, 2025, 5:38pm

Hey, have you had any more luck since? My nano just came in, so as soon as I can figure out how to boot from SSD, I plan to try the Qwen3 family, including VL.

JimikoSK · October 31, 2025, 5:45pm

Qwen3-VL support was just added to llama.cpp yesterday so that would probably be the path of least resistance. Though you’ll probably have to build it from source, but that’s easy.

AastaLLL · November 3, 2025, 7:45am

Hi,

NvMapMemAllocInternalTagged: 1075072515 error 12

This is a known issue on r36.4.7, which reports the same error message above.
You can find more details in the topic below:

Currently, we are still working on the issue internally.
Will let you know once we have any new updates for this.

Thanks.

AastaLLL · December 18, 2025, 5:21am

Hi,

We have fixed the memory issue internally.
Please check the topic shared above for more information.

Thanks.

Topic		Replies	Views
Qwen3-VL-4B fine-tune Jetson Orin Nano generative_ai	1	409	January 22, 2026
How to use Qwen3-ASR-0.6B on jetson orin nano? Jetson Orin Nano cuda , generative_ai , llm	1	430	March 2, 2026
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	1	290	January 13, 2025
NanoVLM Issue on Jetson Orin Nano Jetson Orin Nano generative_ai	9	962	June 6, 2024
Jetson orin nano fail to quanization NanoVLM model Jetson Orin Nano generative_ai	2	271	July 11, 2024
VILA 1.5-3b Model Jetson Orin Nano generative_ai	3	419	June 26, 2025
Mlc_llm(0.19.0) does not support qwen3 model Jetson Orin Nano generative_ai	5	487	May 14, 2025
Performance Inquiry: Optimizing Qwen3-VL 2B Inference for 2 QPS Target on Orin Nano Super Jetson Orin Nano cudnn , cublas , llama	3	354	February 9, 2026
MiniGPT-4 on Jetson Orin Nano 8Gb Dev kit not working Jetson Orin Nano generative_ai	8	676	May 28, 2024
Error on following "NanoVLM - Efficient Multimodal Pipeline" Jetson Orin Nano generative_ai	1	321	May 24, 2024

Orin Nano Qwen3-VL-4B

Related topics