VILA1.5-3b (MLC/nano_llm) Fails to Output Strict 'YES'/'NO' Format on Jetson Orin Nano

k.fujimoto33 · October 7, 2025, 5:35am

Hello,

I’m performing a simple binary classification task using the VILA1.5-3b model on a Jetson Orin Nano 8GB via nano_llm (MLC API). The goal is to force a definitive ‘YES’ or ‘NO’ response, but the model fails to adhere to the format.

Problem Description
Despite using a clear system prompt and a prompt anchor (… Answer: ), the model consistently outputs non-text data or incorrect formatting, such as:

Immediate termination:
Token IDs: 1 or 0
List markers: -

The model is struggling to generate the required two-to-three character string (YES/NO).

Environment and Command
Hardware: Jetson Orin Nano 8GB
Model: Efficient-Large-Model/VILA1.5-3b
API: mlc

Current Command:

python3 -m nano_llm.chat --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --quantization q4f16_ft \
    --max-context-len 256 \
    --max-new-tokens 16 \
    --vision-scaling resize \
    --system-prompt "You are an expert vision model. Respond to the user's question with only the English word 'YES' or 'NO'." \
    --prompt '/data/images/sample.jpg' \
    --prompt 'Is there an object placed in front of the cardboard divider in this image? Answer: '

Questions
1. Fixed Output Format: How can we ensure the VILA1.5-3b model, when run with MLC/nano_llm, reliably generates only the string ‘YES’ or ‘NO’ and nothing else, preventing the premature termination and unexpected token output?
2. –vision-scaling Default: The default setting for --vision-scaling is crop. Specifically, what cropping method (e.g., center crop, random crop) is implemented when this default is used?

Thank you.

AastaLLL · October 7, 2025, 8:35am

Hi,

This is related to the prompt.
It’s not guaranteed that the output will always to yes or no.
But you can set a simpler checker to validate. If not, then run the inference to get a new output again.
Could you share which document mentions the vision-scaling?

Thanks.

k.fujimoto33 · October 7, 2025, 10:57am

Hi,

Thank you for your response. Based only on the information below, I have summarized my understanding and follow-up questions for you.

1. Regarding the Output Response

I understand that strictly limiting the model output to just ‘YES’ or ‘NO’ is inherently difficult.

For inputs where the correct answer should have been ‘YES’, the model occasionally responded with ‘1’ instead. I suspect this might be due to the reason you mentioned. (I haven’t yet observed the inverse: inputs that should be ‘NO’ responding with ‘0’.)

Also, regarding the proposed “simpler checker,” are you suggesting that we implement an external converter to change outputs like ‘0’, ‘False’, or ‘NG’ to ‘NO’, and ‘1’, ‘True’, or ‘OK’ to ‘YES’?

2. Regarding `--vision-scaling`

I found the --vision-scaling argument in the documentation below:

URL: https://dusty-nv.github.io/NanoLLM/agents.html#nano_llm.agents.video_query.VideoQuery

According to this documentation, it appears the default is resize, and choosing crop results in a center-crop.

I was looking through the arguments because I suspected that if the feature map isn’t retained in the area affected by cropping, the model might fail to make a correct judgment.

Are there any other effective arguments that you recommend I review or change to address this problem?

Thank you.

AastaLLL · October 9, 2025, 7:39am

Hi,

Just means you can add a check to verify the reply matches the requirement.
If not, just re-run the inference again.
It is the center crop based on the document.
The parameter is fed into the “CLIPVisionModel”

github.com/dusty-nv/NanoLLM

nano_llm/nano_llm.py

main


      
                  input_dim=vision_hidden_size, 
                  output_dim=llm_hidden_size
              )
              
              self.vla = VLAModel(self, action_space=self.config.pop('norm_stats', {}))
          else:
              # load the image embedding model
              self.vision = [
                  CLIPVisionModel.from_pretrained(
                      vision_model if vision_model else self.config.mm_vision_tower,
                      crop=(kwargs.get('vision_scaling', 'resize') == 'crop'),
                      use_tensorrt=(vision_api == 'auto' or vision_api == 'trt' or vision_api == 'tensorrt'), 
                      dtype=torch.float16)
              ]
              
              self.mm_projector = MMProjector.from_pretrained(self, dtype=torch.float16)

And you can find the implementation below:

github.com/dusty-nv/clip_trt

clip_trt/vision.py

main


      
          self.preprocessor = torch.nn.Sequential()
          
          self.preprocessor.append(
              T.Resize(
                  self.config.input_shape[0] if crop else self.config.input_shape, 
                  interpolation=T.InterpolationMode.BICUBIC# BILINEAR
              )
          )
          
          if crop:
              self.preprocessor.append(T.CenterCrop(self.config.input_shape[0]))
             
          self.preprocessor.append(T.Normalize(factory['mean'], factory['std']))
          self.preprocessor.append(T.ConvertImageDtype(self.dtype))
          
          class VisionEncoder(torch.nn.Module):
              def __init__(self, model):
                  super().__init__()
                  self.model = model
                  self.config = model.config

Thanks.

system · November 4, 2025, 8:04am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NanoVLM Issue on Jetson Orin Nano Jetson Orin Nano generative_ai	9	876	June 6, 2024
Error on following "NanoVLM - Efficient Multimodal Pipeline" Jetson Orin Nano generative_ai	2	291	May 24, 2024
Errors on tutorial NanoVLM Jetson Orin Nano generative_ai	4	650	May 28, 2024
Available with Small Language Model on tutorial Jetson Orin Nano generative_ai	3	982	May 3, 2024
NVIDI LLaVA VILA models Jetson Orin Nano generative_ai	4	153	June 25, 2025
MiniGPT-4 on Jetson Orin Nano 8Gb Dev kit not working Jetson Orin Nano generative_ai	9	593	May 28, 2024
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	213	January 13, 2025
Orin Nano Qwen3-VL-4B Jetson Orin Nano generative_ai , llm	9	1074	December 18, 2025
VILA 1.5-3b Model Jetson Orin Nano generative_ai	4	296	June 26, 2025
Jetson orin nano fail to quanization NanoVLM model Jetson Orin Nano generative_ai	3	238	July 30, 2024

VILA1.5-3b (MLC/nano_llm) Fails to Output Strict 'YES'/'NO' Format on Jetson Orin Nano

1. Regarding the Output Response

2. Regarding --vision-scaling

Related topics

2. Regarding `--vision-scaling`