Errors on tutorial NanoVLM

I followed the tutorial “NanoVLM - Efficient Multimodal Pipeline” (NanoVLM - NVIDIA Jetson AI Lab) on Jetson Orin Nano (8GB) (with 128GB SD card).
I proceeded with the following command for three models (change the “–model” flag), each with a different error.

I want to know how to solve each error.

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.chat --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32
  1. VILA1.5-3b (–model Efficient-Large-Model/VILA1.5-3b)
    After the model download was completed, an error occurred during the mlc quantization process, saying, “Exception: The model config should have continued information about maximum sequence length.”
    Details are as follows.
seongkyu@ubuntu:~$ jetson-containers run $(autotag nano_llm) \
>   python3 -m nano_llm.chat --api=mlc \
>     --model Efficient-Large-Model/VILA1.5-3b \
>     --max-context-len 256 \
>     --max-new-tokens 32
Namespace(disable=[''], output='/tmp/autotag', packages=['nano_llm'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.5.0  JETPACK_VERSION=5.1  CUDA_VERSION=11.4
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r35.4.1
[sudo] password for seongkyu:
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/seongkyu/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 dustynv/nano_llm:r35.4.1 python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 122.77it/s]
Fetching 17 files:   0%|                                                                                                                                                            | 0/17 [00:00<?, ?it/s]
llm/model-00001-of-00002.safetensors:  59%|████████████████████████████████████████████████████████████████████████▋                                                   | 2.92G/4.97G [39:59<48:23, 709kBllm/model-00001-of-00002.safetensors:  59%|████████████████████████████████████████████████████████████████████████▉                                                   | 2.93G/4.97G [40:02<48:59, 697kBllm/model-00001-of-00002.safetensors:  59%|█████████████████████████████████████████████████████████████████████████▏                                                  | 2.94G/4.97G [40:17<48:17, 703kBllm/model-00001-of-00002.safetensors:  59%|█████████████████████████████████████████████████████████████████████████▏                                                  | 2.94G/4.97G [40:29<48:17, 703kBllm/model-00llm/model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [1:12:51<00:00, 721kB/s]
Fetching 17 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [1:12:51<00:00, 257.17s/it]
09:01:17 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/699b413ed13620957e955bd7fb938852afa258fc with MLC
09:01:20 | INFO | backing up original model config to /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/699b413ed13620957e955bd7fb938852afa258fc/config.json.backup
09:01:20 | INFO | patching model config with {'model_type': 'llama'}
09:01:20 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256


Using path "/data/models/mlc/dist/models/VILA1.5-3b" for model "VILA1.5-3b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 47, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 43, in main
    core.build_model_from_args(parsed_args)
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py", line 834, in build_model_from_args
    mod, param_manager, params, model_config = model_generators[args.model_category].get_model(
  File "/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/llama.py", line 1333, in get_model
    raise Exception(
Exception: The model config should contain information about maximum sequence length.
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/NanoLLM/nano_llm/chat/__main__.py", line 29, in <module>
    model = NanoLLM.from_pretrained(
  File "/opt/NanoLLM/nano_llm/nano_llm.py", line 71, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 59, in __init__
    quant = MLCModel.quantize(model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 278, in quantize
    subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)  
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256 ' returned non-zero exit status 1.
  1. Obsidian-3B (–model NousResearch/Obsidian-3B-V0.5)
    I thought the reason for the previous problem was that there was no information about maximum sequence length only in the ‘VILA1.5-3b’ model file of huggingface, so I changed the model and proceeded.
    However, it stopped while downloading this model from huggingface, and the following error occurred.



    I think we should have access to the “force_download” and “resume_download” parameters inside nano_llm, but the tutorial doesn’t say how.

  2. Llava-7b (–model liuhaotian/llava-v1.6-vicuna-7b)
    I thought the reason why the problem number 2 occurred was because the huggingface server downloading the model was unstable, so I changed the model once again.
    llava-7b proceeded well to download the model and quantization process.
    However, after entering the path of the image file into the prompt and entering a question about it, the following internalerror occurred.

seongkyu@ubuntu:~$ jetson-containers run $(autotag nano_llm) \
>   python3 -m nano_llm.chat --api=mlc \
>     --model liuhaotian/llava-v1.6-vicuna-7b \
>     --max-context-len 256 \
>     --max-new-tokens 32 \
>     --prompt /data/prompts/images.json
Namespace(disable=[''], output='/tmp/autotag', packages=['nano_llm'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.5.0  JETPACK_VERSION=5.1  CUDA_VERSION=11.4
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r35.4.1
[sudo] password for seongkyu:
!Sorry, try again.
[sudo] password for seongkyu:
Sorry, try again.
[sudo] password for seongkyu:
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/seongkyu/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 dustynv/nano_llm:r35.4.1 python3 -m nano_llm.chat --api=mlc --model liuhaotian/llava-v1.6-vicuna-7b --max-context-len 256 --max-new-tokens 32 --prompt /data/prompts/images.json
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
04:45:14 | INFO | loading prompts from /data/prompts/images.json
Fetching 10 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 59493.67it/s]
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 101.75it/s]
04:45:15 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.6-vicuna-7b/snapshots/deae57a8c0ccb0da4c2661cc1891cc9d06503d11 with MLC
04:45:19 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=624000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None
04:45:19 | INFO | loading llava-v1.6-vicuna-7b from /data/models/mlc/dist/llava-v1.6-vicuna-7b-ctx256/llava-v1.6-vicuna-7b-q4f16_ft/llava-v1.6-vicuna-7b-q4f16_ft-cuda.so
04:45:28 | WARNING | model library /data/models/mlc/dist/llava-v1.6-vicuna-7b-ctx256/llava-v1.6-vicuna-7b-q4f16_ft/llava-v1.6-vicuna-7b-q4f16_ft-cuda.so was missing metadata
04:46:23 | INFO | loading clip vision model openai/clip-vit-large-patch14-336
<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> openai/clip-vit-large-patch14-336 CLIPImageProcessor {
  "_valid_processor_keys": [
    "images",
    "do_resize",
    "size",
    "resample",
    "do_center_crop",
    "crop_size",
    "do_rescale",
    "rescale_factor",
    "do_normalize",
    "image_mean",
    "image_std",
    "do_convert_rgb",
    "return_tensors",
    "data_format",
    "input_data_format"
  ],
  "crop_size": {
    "height": 336,
    "width": 336
  },
  "do_center_crop": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.48145466,
    0.4578275,
    0.40821073
  ],
  "image_processor_type": "CLIPImageProcessor",
  "image_std": [
    0.26862954,
    0.26130258,
    0.27577711
  ],
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "shortest_edge": 336
  }
}

<class 'transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection'> openai/clip-vit-large-patch14-336 CLIPVisionModelWithProjection(
  (vision_model): CLIPVisionTransformer(
    (embeddings): CLIPVisionEmbeddings(
      (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
      (position_embedding): Embedding(577, 1024)
    )
    (pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    (encoder): CLIPEncoder(
      (layers): ModuleList(
        (0-23): 24 x CLIPEncoderLayer(
          (self_attn): CLIPAttention(
            (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (mlp): CLIPMLP(
            (activation_fn): QuickGELUActivation()
            (fc1): Linear(in_features=1024, out_features=4096, bias=True)
            (fc2): Linear(in_features=4096, out_features=1024, bias=True)
          )
          (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        )
      )
    )
    (post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (visual_projection): Linear(in_features=1024, out_features=768, bias=False)
)
┌──────────────┬───────────────────────────────────┐
│ name         │ openai/clip-vit-large-patch14-336 │
├──────────────┼───────────────────────────────────┤
│ input_shape  │ (336, 336)                        │
├──────────────┼───────────────────────────────────┤
│ output_shape │ torch.Size([1, 768])              │
└──────────────┴───────────────────────────────────┘
04:47:44 | INFO | loading mm_projector weights from /data/models/huggingface/models--liuhaotian--llava-v1.6-vicuna-7b/snapshots/deae57a8c0ccb0da4c2661cc1891cc9d06503d11/mm_projector.bin
mm_projector Sequential(
  (0): Linear(in_features=1024, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear(in_features=4096, out_features=4096, bias=True)
)
┌────────────────────────────┬────────────────────────────────────────────────────────────────┐
│ _name_or_path              │ ./checkpoints/vicuna-7b-v1-5                                   │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ architectures              │ ['LlavaLlamaForCausalLM']                                      │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ attention_bias             │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ attention_dropout          │ 0.0                                                            │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ bos_token_id               │ 1                                                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ eos_token_id               │ 2                                                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ freeze_mm_mlp_adapter      │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ freeze_mm_vision_resampler │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ hidden_act                 │ silu                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ hidden_size                │ 4096                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_aspect_ratio         │ anyres                                                         │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_crop_resolution      │ 224                                                            │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_grid_pinpoints       │ [[336, 672], [672, 336], [672, 672], [1008, 336], [336, 1008]] │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_split_resolution     │ 224                                                            │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ initializer_range          │ 0.02                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ intermediate_size          │ 11008                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ max_position_embeddings    │ 4096                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_hidden_size             │ 1024                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_patch_merge_type        │ spatial_unpad                                                  │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_projector_lr            │                                                                │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_projector_type          │ mlp2x_gelu                                                     │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_resampler_type          │                                                                │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_use_im_patch_token      │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_use_im_start_end        │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_select_feature   │ patch                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_select_layer     │ -2                                                             │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_tower            │ openai/clip-vit-large-patch14-336                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_tower_lr         │ 2e-06                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ model_type                 │ llama                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_attention_heads        │ 32                                                             │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_hidden_layers          │ 32                                                             │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_key_value_heads        │ 32                                                             │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ pad_token_id               │ 0                                                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ pretraining_tp             │ 1                                                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rms_norm_eps               │ 1e-05                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rope_scaling               │                                                                │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rope_theta                 │ 10000.0                                                        │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings        │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tokenizer_model_max_length │ 4096                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tokenizer_padding_side     │ right                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ torch_dtype                │ bfloat16                                                       │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ transformers_version       │ 4.36.2                                                         │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tune_mm_mlp_adapter        │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tune_mm_vision_resampler   │ False                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ unfreeze_mm_vision_tower   │ True                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ use_cache                  │ True                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ use_mm_proj                │ True                                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ vocab_size                 │ 32000                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ name                       │ llava-v1.6-vicuna-7b                                           │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ api                        │ mlc                                                            │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ quant                      │ q4f16_ft                                                       │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ type                       │ llama                                                          │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ max_length                 │ 256                                                            │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size         │ -1                                                             │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ load_time                  │ 150.4796289320002                                              │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ params_size                │ 3232.7265625                                                   │
└────────────────────────────┴────────────────────────────────────────────────────────────────┘

04:47:46 | INFO | using chat template 'vicuna-v1' for model llava-v1.6-vicuna-7b
04:47:46 | INFO | model 'llava-v1.6-vicuna-7b', chat template 'vicuna-v1' stop tokens:  ['</s>'] -> [2]
>> PROMPT: /data/images/dogs.jpg

>> PROMPT: What breeds of dogs are in the image?

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 523, in _run
    self._generate(stream)
  File "/opt/NanoLLM/nano_llm/models/mlc.py", line 458, in _generate
    output = self._prefill(input,  # prefill_with_embed
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
tvm.error.InternalError: Traceback (most recent call last):
  [bt] (8) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x230) [0xfffebf9ac6c8]
  [bt] (7) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()+0x210) [0xfffebf9aad58]
  [bt] (6) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)+0x5e4) [0xfffebf9ab5bc]
  [bt] (5) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x7c) [0xfffebf9a99fc]
  [bt] (4) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::NDArray (tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}>(tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)+0x10) [0xfffebf977638]
  [bt] (3) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::TypedPackedFunc<tvm::runtime::NDArray (tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}>(tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const+0x27c) [0xfffebf977374]
  [bt] (2) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::memory::StorageObj::AllocNDArray(long, tvm::runtime::ShapeTuple, DLDataType)+0x3a8) [0xfffebf9268c8]
  [bt] (1) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x78) [0xfffebd57af58]
  [bt] (0) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xfffebf9236f0]
  File "/opt/mlc-llm/3rdparty/tvm/src/runtime/memory/memory_manager.cc", line 108
InternalError: Check failed: (offset + needed_size <= this->buffer.size) is false: storage allocation failure, attempted to allocate 15360000 at offset 0 in region that is 11272192bytes

Hi @ygoongood12 , I just tried re-running the quantization and same command as you here again on JetPack 5.1.2 / L4T R35, and did not face the issue. Can you try pulling the latest nano_llm container on your end?

sudo docker pull dustynv/nano_llm:r35.4.1

Thank you for your reply.
I pulled the latest nano_llm container following your guide. But I got the same error message.
And In this issue, I said that I changed model twice and I got other error messages for each try.
Can you see and reply again?

Hi @ygoongood12, sorry about that, I had only seen those error messages in older version of container. Due to the model download problems you had, I recommend deleting your /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b directory.

It may also be related to being on JetPack 5 instead of the latest on JetPack 6, although it passed these model tests on JetPack 5. I believe this tag on JetPack 5 is actually newer at present time: dustynv/nano_llm:24.5.1-r35.4.1

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.