NanoVLM Issue on Jetson Orin Nano

Hi Team

We are using Jetson Orin Nano 8gb with the latest jtepack version of 6.0 and we are trying to run the NanoVLM model for that we are foillowing the below steps:

  1. Cloned the respository of dusty nv container in the link : GitHub - dusty-nv/jetson-containers: Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

  2. After cloning it , ran the following command to install the packages bash jetson-containers/install.sh

  3. Once its completed pulling the image of nanovlm using the command : jetson-containers run $(autotag nano_llm)

  4. On completeion of above process , I am trying to run the command in jetson orin nano for downloading and inference the model as mentoined in this link NanoVLM - NVIDIA Jetson AI Lab (jetson-ai-lab.com) , tried only the first command with model [VILA-2.7b] , but iits getting killed automatically , below I am attaching the error

Error:
root@ubuntu:/home/orin/Downloads/jetson-containers# jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
Namespace(packages=[β€˜nano_llm’], prefer=[β€˜local’, β€˜registry’, β€˜build’], disable=[β€˜β€™], user=β€˜dustynv’, output=β€˜/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
– Finding compatible container image for [β€˜nano_llm’]
dustynv/nano_llm:24.5-r36.2.0
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orin/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 dustynv/nano_llm:24.5-r36.2.0 python3 -m nano_llm.chat --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
** warnings.warn(**
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 42281.29it/s]
Fetching 12 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 35951.18it/s]
10:40:21 | INFO | loading /data/models/huggingface/models–Efficient-Large-Model–VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
10:40:22 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors

Using path β€œ/data/models/mlc/dist/models/VILA-2.7b” for model β€œVILA-2.7b”
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights… This may take a while. | 0/327 [00:00<?, ?tensors/s]
Get old param: 1%|β–ˆβ– | 2/197 [00:03<04:12, 1.29s/tensors]Traceback (most recent call last): | 1/327 [00:03<16:47, 3.09s/tensors]
** File β€œ/usr/lib/python3.10/runpy.py”, line 196, in _run_module_as_main**
** return _run_code(code, main_globals, None,**
** File β€œ/usr/lib/python3.10/runpy.py”, line 86, in _run_code**
** exec(code, run_globals)**
** File β€œ/opt/NanoLLM/nano_llm/chat/main.py”, line 30, in **
** model = NanoLLM.from_pretrained(**
** File β€œ/opt/NanoLLM/nano_llm/nano_llm.py”, line 73, in from_pretrained**
** model = MLCModel(model_path, kwargs)
** File β€œ/opt/NanoLLM/nano_llm/models/mlc.py”, line 59, in init**
** quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, kwargs)
** File β€œ/opt/NanoLLM/nano_llm/models/mlc.py”, line 278, in quantize**
** subprocess.run(cmd, executable=β€˜/bin/bash’, shell=True, check=True) **
** File β€œ/usr/lib/python3.10/subprocess.py”, line 526, in run**
** raise CalledProcessError(retcode, process.args,**
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors ’ died with <Signals.SIGKILL: 9>.

Eventhough we have enough space in the device as mentioned below

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 915G 56G 813G 7% /

Can you help us on this issue

Regards
Gomathy Sankaran

Hi @Shankaran, can you try mounting SWAP, disabling ZRAM, and if needed disable the desktop GUI during the model building/quantization phase that it is running out of memory on - here is info about doing that: https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#mounting-swap

Hi @dusty_nv I had followed the link which you mentioned its work well able to download the quantize the model without anyerror but after that its downloading the clip model and trying to optimize and converting into trt it after sometimes its not giving any prompt as mentioned in the link NanoVLM - NVIDIA Jetson AI Lab (jetson-ai-lab.com) its directly coming out of the container , below I am attaching the error of it , Sometimes orin nano its getting restarted or turn OFF

I had tried to clear the memory buffer cache using this command sudo sh -c β€˜echo 1 > /proc/sys/vm/drop_caches’ but nothing changed

Error
jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
Namespace(packages=[β€˜nano_llm’], prefer=[β€˜local’, β€˜registry’, β€˜build’], disable=[β€˜β€™], user=β€˜dustynv’, output=β€˜/tmp/autotag’, quiet=False, verbose=False)
– L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
– Finding compatible container image for [β€˜nano_llm’]
dustynv/nano_llm:24.5-r36.2.0
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/orin/Downloads/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 dustynv/nano_llm:24.5-r36.2.0 python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
** warnings.warn(**
Fetching 10 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 61052.46it/s]
Fetching 12 files: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 9272.60it/s]
12:09:46 | INFO | loading /data/models/huggingface/models–Efficient-Large-Model–VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
12:09:48 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=624000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None
12:09:48 | INFO | loading VILA-2.7b from /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so
12:09:49 | WARNING | model library /data/models/mlc/dist/VILA-2.7b-ctx256/VILA-2.7b-q4f16_ft/VILA-2.7b-q4f16_ft-cuda.so was missing metadata
12:09:50 | INFO | loading clip vision model openai/clip-vit-large-patch14-336
<class β€˜nano_llm.vision.clip.CLIPImageEmbedding.init..VisionEncoder’> openai/clip-vit-large-patch14-336 VisionEncoder(
** (model): CLIPVisionModelWithProjection(**
** (vision_model): CLIPVisionTransformer(**
** (embeddings): CLIPVisionEmbeddings(**
** (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)**
** (position_embedding): Embedding(577, 1024)**
** )**
** (pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)**
** (encoder): CLIPEncoder(**
** (layers): ModuleList(**
** (0-23): 24 x CLIPEncoderLayer(**
** (self_attn): CLIPAttention(**
** (k_proj): Linear(in_features=1024, out_features=1024, bias=True)**
** (v_proj): Linear(in_features=1024, out_features=1024, bias=True)**
** (q_proj): Linear(in_features=1024, out_features=1024, bias=True)**
** (out_proj): Linear(in_features=1024, out_features=1024, bias=True)**
** )**
** (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)**
** (mlp): CLIPMLP(**
** (activation_fn): QuickGELUActivation()**
** (fc1): Linear(in_features=1024, out_features=4096, bias=True)**
** (fc2): Linear(in_features=4096, out_features=1024, bias=True)**
** )**
** (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)**
** )**
** )**
** )**
** (post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)**
** )**
** (visual_projection): Linear(in_features=1024, out_features=768, bias=False)**
** )**
)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ name β”‚ openai/clip-vit-large-patch14-336 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ input_shape β”‚ (336, 336) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ output_shape β”‚ torch.Size([1, 768]) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
12:09:57 | INFO | optimizing openai/clip-vit-large-patch14-336 with TensorRT…
[05/20/2024-12:09:57] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 481, GPU 6736 (MiB)
[05/20/2024-12:09:58] [TRT] [V] Trying to load shared library libnvinfer_builder_resource.so.8.6.2
[05/20/2024-12:09:58] [TRT] [V] Loaded shared library libnvinfer_builder_resource.so.8.6.2
[05/20/2024-12:10:05] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1154, GPU +748, now: CPU 1671, GPU 7411 (MiB)
[05/20/2024-12:10:06] [TRT] [V] CUDA lazy loading is enabled.
/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py:279: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
** if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):**
/usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py:319: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
** if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):**


** [W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored**
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /opt/onnxruntime/onnxruntime/core/graph/model.cc:179 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9*
1] [TRT] [V] --------------- Timing Runner: /module/model/vision_model/embeddings/patch_embedding/Conv (CaskFlattenConvolution[0x80000036])
[05/20/2024-12:24:41] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping
[05/20/2024-12:24:41] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0x1bf48a356bd0c083
[05/20/2024-12:24:41] [TRT] [V] =============== Computing costs for {ForeignNode[module.model.vision_model.embeddings.position_embedding.weight…/module/model/visual_projection/MatMul]}
[05/20/2024-12:24:41] [TRT] [V] *************** Autotuning format combination: Half(589824,576,24,1) β†’ Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(590848,1024,1), Float(768,1) ***************
[05/20/2024-12:24:41] [TRT] [V] --------------- Timing Runner: {ForeignNode[module.model.vision_model.embeddings.position_embedding.weight…/module/model/visual_projection/MatMul]} (Myelin[0x80000023])

Can you help me on this

Hi @Shankaran, can you try running it with --vision-api=hf ? That should allow you to get past the TRT thing. You also appear to be running an older version of the container, which since automatically disables this on Nano. Try doing this:

cd /path/to/your/jetson-containers
git pull
docker pull $(autotag nano_llm)

Hi @dusty_nv Thanks a lot for your help using vision-api=hf its working well , and I have downloaded the latest image of nano_llm Docker , While testing with the image its fine and able to get the correct prompt , while running the video with command

CMD : jetson-containers run $(autotag nano_llm) python3 -m nano_llm.vision.video --model Efficient-Large-Model/VILA2.7b --max-images 8 --max-new-tokens 48 --video-input /data/my_video.mp4 --video-output /data/my_output.mp4 --prompt β€˜What changes occurred in the video?’

its throwing an error : /usr/local/bin/python3: No module named nano_llm.vision.video
Do I need to download any image seperately for video purpose including live camera feed also

Is there any repository can I follow for depolying and inferencing the VILA Models on Quadro RTX 4000 (Ubuntu 18.04)with CUDA_Version 12.0

Can you help me on these

Hi @dusty_nv I am able to pass the video as a input in this command nano_llm.agents.video_query but not able to save the video , Can you help us on these issue and any updates for the above post

@Shankaran you may need to run docker pull $(autotag nano_llm) to update your container image, as that nano_llm.vision.video command was added more recently.

1 Like

Hi @dusty_nv It worked well for vision video , Thanks for the support . Using the VILA Model we are getting generic response is there any instruction based tuning or fine tuning available for these model , If its there can you share the link and procedures need to be followed for that model

Typically you would use bigger model to get more detailed descriptions of the image, but if you are on Orin Nano it is very tight to fit 7B/8B VLM (including the vision encoder, ect) in the memory. Here is the VILA repo including the training:

Thanks @dusty_nv ,

  1. While using vision.video sometimes its able to save the video but after sometime if I am running the same process it is not able to save the video its poping up low memory warning do I need to increase the swap space as you mentioned eariler or what I need to follow for this

  2. If I am doing any changes on the code part of local directory , its not reflecting inside the container , do I need to rebuild it as of now I am using this command only jetson-containers run $(autotag nano_llm)