I followed the tutorial “NanoVLM - Efficient Multimodal Pipeline” (NanoVLM - NVIDIA Jetson AI Lab) on Jetson Orin Nano (8GB) (with 128GB SD card).
I proceeded with the following command for three models (change the “–model” flag), each with a different error.
I want to know how to solve each error.
jetson-containers run $(autotag nano_llm) \
python3 -m nano_llm.chat --api=mlc \
--model Efficient-Large-Model/VILA1.5-3b \
--max-context-len 256 \
--max-new-tokens 32
- VILA1.5-3b (–model Efficient-Large-Model/VILA1.5-3b)
After the model download was completed, an error occurred during the mlc quantization process, saying, “Exception: The model config should have continued information about maximum sequence length.”
Details are as follows.
seongkyu@ubuntu:~$ jetson-containers run $(autotag nano_llm) \
> python3 -m nano_llm.chat --api=mlc \
> --model Efficient-Large-Model/VILA1.5-3b \
> --max-context-len 256 \
> --max-new-tokens 32
Namespace(disable=[''], output='/tmp/autotag', packages=['nano_llm'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.5.0 JETPACK_VERSION=5.1 CUDA_VERSION=11.4
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r35.4.1
[sudo] password for seongkyu:
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/seongkyu/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 dustynv/nano_llm:r35.4.1 python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 122.77it/s]
Fetching 17 files: 0%| | 0/17 [00:00<?, ?it/s]
llm/model-00001-of-00002.safetensors: 59%|████████████████████████████████████████████████████████████████████████▋ | 2.92G/4.97G [39:59<48:23, 709kBllm/model-00001-of-00002.safetensors: 59%|████████████████████████████████████████████████████████████████████████▉ | 2.93G/4.97G [40:02<48:59, 697kBllm/model-00001-of-00002.safetensors: 59%|█████████████████████████████████████████████████████████████████████████▏ | 2.94G/4.97G [40:17<48:17, 703kBllm/model-00001-of-00002.safetensors: 59%|█████████████████████████████████████████████████████████████████████████▏ | 2.94G/4.97G [40:29<48:17, 703kBllm/model-00llm/model-00001-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.97G/4.97G [1:12:51<00:00, 721kB/s]
Fetching 17 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [1:12:51<00:00, 257.17s/it]
09:01:17 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/699b413ed13620957e955bd7fb938852afa258fc with MLC
09:01:20 | INFO | backing up original model config to /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/699b413ed13620957e955bd7fb938852afa258fc/config.json.backup
09:01:20 | INFO | patching model config with {'model_type': 'llama'}
09:01:20 | INFO | running MLC quantization:
python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256
Using path "/data/models/mlc/dist/models/VILA1.5-3b" for model "VILA1.5-3b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 47, in <module>
main()
File "/usr/local/lib/python3.8/dist-packages/mlc_llm/build.py", line 43, in main
core.build_model_from_args(parsed_args)
File "/usr/local/lib/python3.8/dist-packages/mlc_llm/core.py", line 834, in build_model_from_args
mod, param_manager, params, model_config = model_generators[args.model_category].get_model(
File "/usr/local/lib/python3.8/dist-packages/mlc_llm/relax_model/llama.py", line 1333, in get_model
raise Exception(
Exception: The model config should contain information about maximum sequence length.
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/chat/__main__.py", line 29, in <module>
model = NanoLLM.from_pretrained(
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 71, in from_pretrained
model = MLCModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 59, in __init__
quant = MLCModel.quantize(model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 278, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA1.5-3b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA1.5-3b-ctx256 ' returned non-zero exit status 1.
-
Obsidian-3B (–model NousResearch/Obsidian-3B-V0.5)
I thought the reason for the previous problem was that there was no information about maximum sequence length only in the ‘VILA1.5-3b’ model file of huggingface, so I changed the model and proceeded.
However, it stopped while downloading this model from huggingface, and the following error occurred.
I think we should have access to the “force_download” and “resume_download” parameters inside nano_llm, but the tutorial doesn’t say how. -
Llava-7b (–model liuhaotian/llava-v1.6-vicuna-7b)
I thought the reason why the problem number 2 occurred was because the huggingface server downloading the model was unstable, so I changed the model once again.
llava-7b proceeded well to download the model and quantization process.
However, after entering the path of the image file into the prompt and entering a question about it, the following internalerror occurred.
seongkyu@ubuntu:~$ jetson-containers run $(autotag nano_llm) \
> python3 -m nano_llm.chat --api=mlc \
> --model liuhaotian/llava-v1.6-vicuna-7b \
> --max-context-len 256 \
> --max-new-tokens 32 \
> --prompt /data/prompts/images.json
Namespace(disable=[''], output='/tmp/autotag', packages=['nano_llm'], prefer=['local', 'registry', 'build'], quiet=False, user='dustynv', verbose=False)
-- L4T_VERSION=35.5.0 JETPACK_VERSION=5.1 CUDA_VERSION=11.4
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r35.4.1
[sudo] password for seongkyu:
!Sorry, try again.
[sudo] password for seongkyu:
Sorry, try again.
[sudo] password for seongkyu:
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/seongkyu/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/video1 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-3 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-6 --device /dev/i2c-7 --device /dev/i2c-8 --device /dev/i2c-9 dustynv/nano_llm:r35.4.1 python3 -m nano_llm.chat --api=mlc --model liuhaotian/llava-v1.6-vicuna-7b --max-context-len 256 --max-new-tokens 32 --prompt /data/prompts/images.json
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.18) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
04:45:14 | INFO | loading prompts from /data/prompts/images.json
Fetching 10 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 59493.67it/s]
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 101.75it/s]
04:45:15 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.6-vicuna-7b/snapshots/deae57a8c0ccb0da4c2661cc1891cc9d06503d11 with MLC
04:45:19 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=624000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=11040, driver_version=None
04:45:19 | INFO | loading llava-v1.6-vicuna-7b from /data/models/mlc/dist/llava-v1.6-vicuna-7b-ctx256/llava-v1.6-vicuna-7b-q4f16_ft/llava-v1.6-vicuna-7b-q4f16_ft-cuda.so
04:45:28 | WARNING | model library /data/models/mlc/dist/llava-v1.6-vicuna-7b-ctx256/llava-v1.6-vicuna-7b-q4f16_ft/llava-v1.6-vicuna-7b-q4f16_ft-cuda.so was missing metadata
04:46:23 | INFO | loading clip vision model openai/clip-vit-large-patch14-336
<class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> openai/clip-vit-large-patch14-336 CLIPImageProcessor {
"_valid_processor_keys": [
"images",
"do_resize",
"size",
"resample",
"do_center_crop",
"crop_size",
"do_rescale",
"rescale_factor",
"do_normalize",
"image_mean",
"image_std",
"do_convert_rgb",
"return_tensors",
"data_format",
"input_data_format"
],
"crop_size": {
"height": 336,
"width": 336
},
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_rescale": true,
"do_resize": true,
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "CLIPImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"shortest_edge": 336
}
}
<class 'transformers.models.clip.modeling_clip.CLIPVisionModelWithProjection'> openai/clip-vit-large-patch14-336 CLIPVisionModelWithProjection(
(vision_model): CLIPVisionTransformer(
(embeddings): CLIPVisionEmbeddings(
(patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
(position_embedding): Embedding(577, 1024)
)
(pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder): CLIPEncoder(
(layers): ModuleList(
(0-23): 24 x CLIPEncoderLayer(
(self_attn): CLIPAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(mlp): CLIPMLP(
(activation_fn): QuickGELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
)
(layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
)
(post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(visual_projection): Linear(in_features=1024, out_features=768, bias=False)
)
┌──────────────┬───────────────────────────────────┐
│ name │ openai/clip-vit-large-patch14-336 │
├──────────────┼───────────────────────────────────┤
│ input_shape │ (336, 336) │
├──────────────┼───────────────────────────────────┤
│ output_shape │ torch.Size([1, 768]) │
└──────────────┴───────────────────────────────────┘
04:47:44 | INFO | loading mm_projector weights from /data/models/huggingface/models--liuhaotian--llava-v1.6-vicuna-7b/snapshots/deae57a8c0ccb0da4c2661cc1891cc9d06503d11/mm_projector.bin
mm_projector Sequential(
(0): Linear(in_features=1024, out_features=4096, bias=True)
(1): GELU(approximate='none')
(2): Linear(in_features=4096, out_features=4096, bias=True)
)
┌────────────────────────────┬────────────────────────────────────────────────────────────────┐
│ _name_or_path │ ./checkpoints/vicuna-7b-v1-5 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ architectures │ ['LlavaLlamaForCausalLM'] │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ attention_bias │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ attention_dropout │ 0.0 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ bos_token_id │ 1 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ eos_token_id │ 2 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ freeze_mm_mlp_adapter │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ freeze_mm_vision_resampler │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ hidden_act │ silu │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ hidden_size │ 4096 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_aspect_ratio │ anyres │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_crop_resolution │ 224 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_grid_pinpoints │ [[336, 672], [672, 336], [672, 672], [1008, 336], [336, 1008]] │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ image_split_resolution │ 224 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ initializer_range │ 0.02 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ intermediate_size │ 11008 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ max_position_embeddings │ 4096 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_hidden_size │ 1024 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_patch_merge_type │ spatial_unpad │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_projector_lr │ │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_projector_type │ mlp2x_gelu │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_resampler_type │ │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_use_im_patch_token │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_use_im_start_end │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_select_feature │ patch │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_select_layer │ -2 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_tower │ openai/clip-vit-large-patch14-336 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ mm_vision_tower_lr │ 2e-06 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ model_type │ llama │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_attention_heads │ 32 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_hidden_layers │ 32 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ num_key_value_heads │ 32 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ pad_token_id │ 0 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ pretraining_tp │ 1 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rms_norm_eps │ 1e-05 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rope_scaling │ │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ rope_theta │ 10000.0 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tie_word_embeddings │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tokenizer_model_max_length │ 4096 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tokenizer_padding_side │ right │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ torch_dtype │ bfloat16 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ transformers_version │ 4.36.2 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tune_mm_mlp_adapter │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ tune_mm_vision_resampler │ False │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ unfreeze_mm_vision_tower │ True │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ use_cache │ True │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ use_mm_proj │ True │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ vocab_size │ 32000 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ name │ llava-v1.6-vicuna-7b │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ api │ mlc │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ quant │ q4f16_ft │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ type │ llama │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ max_length │ 256 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ prefill_chunk_size │ -1 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ load_time │ 150.4796289320002 │
├────────────────────────────┼────────────────────────────────────────────────────────────────┤
│ params_size │ 3232.7265625 │
└────────────────────────────┴────────────────────────────────────────────────────────────────┘
04:47:46 | INFO | using chat template 'vicuna-v1' for model llava-v1.6-vicuna-7b
04:47:46 | INFO | model 'llava-v1.6-vicuna-7b', chat template 'vicuna-v1' stop tokens: ['</s>'] -> [2]
>> PROMPT: /data/images/dogs.jpg
>> PROMPT: What breeds of dogs are in the image?
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 523, in _run
self._generate(stream)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 458, in _generate
output = self._prefill(input, # prefill_with_embed
File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall
File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
File "/usr/local/lib/python3.8/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
tvm.error.InternalError: Traceback (most recent call last):
[bt] (8) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x230) [0xfffebf9ac6c8]
[bt] (7) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()+0x210) [0xfffebf9aad58]
[bt] (6) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)+0x5e4) [0xfffebf9ab5bc]
[bt] (5) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x7c) [0xfffebf9a99fc]
[bt] (4) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::NDArray (tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}>(tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)+0x10) [0xfffebf977638]
[bt] (3) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::TypedPackedFunc<tvm::runtime::NDArray (tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)>::AssignTypedLambda<tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}>(tvm::runtime::Registry::set_body_method<tvm::runtime::memory::Storage, tvm::runtime::memory::StorageObj, tvm::runtime::NDArray, long, tvm::runtime::ShapeTuple, DLDataType, void>(tvm::runtime::NDArray (tvm::runtime::memory::StorageObj::*)(long, tvm::runtime::ShapeTuple, DLDataType))::{lambda(tvm::runtime::memory::Storage, long, tvm::runtime::ShapeTuple, DLDataType)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const+0x27c) [0xfffebf977374]
[bt] (2) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::memory::StorageObj::AllocNDArray(long, tvm::runtime::ShapeTuple, DLDataType)+0x3a8) [0xfffebf9268c8]
[bt] (1) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x78) [0xfffebd57af58]
[bt] (0) /usr/local/lib/python3.8/dist-packages/tvm/libtvm.so(tvm::runtime::Backtrace[abi:cxx11]()+0x30) [0xfffebf9236f0]
File "/opt/mlc-llm/3rdparty/tvm/src/runtime/memory/memory_manager.cc", line 108
InternalError: Check failed: (offset + needed_size <= this->buffer.size) is false: storage allocation failure, attempted to allocate 15360000 at offset 0 in region that is 11272192bytes