Cannot start live llava in Jetson Orin Nano "video source /dev/video0 timed out during capture"

Hi all,

I follow the tutorial in AI lab and this post:

I run the command:

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output \

And this is what I have:

Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- Finding compatible container image for ['nano_llm']
V4L2_DEVICES:  --device /dev/video0 
### DISPLAY environmental variable is already set: ":0"
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/lcmo/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250204_111816 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32 --video-input /dev/video0 --video-output webrtc://@:8554/output --vision-api=hf
/usr/local/lib/python3.10/dist-packages/transformers/utils/ FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/ FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
Fetching 13 files: 100%|█████████████████████| 13/13 [00:00<00:00, 42366.71it/s]
Fetching 17 files: 100%|██████████████████████| 17/17 [00:00<00:00, 5149.73it/s]
11:18:30 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026 with MLC
11:18:35 | INFO | NumExpr defaulting to 6 threads.
11:18:35 | WARNING | AWQ not installed (requires JetPack 6 / L4T R36) - AWQ models will fail to initialize
11:18:37 | INFO | patching model config with {'model_type': 'llama'}
11:18:38 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1020000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12060, driver_version=None
11:18:38 | INFO | loading VILA1.5-3b from /data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/
11:18:38 | WARNING | model library /data/models/mlc/dist/VILA1.5-3b/ctx256/VILA1.5-3b-q4f16_ft/ was missing metadata
11:18:54 | INFO | loading siglip vision model /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026/vision_tower
11:19:06 | INFO | loaded siglip vision model /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshots/42d1dda6807cc521ef27674ca2ae157539d17026/vision_tower
11:19:07 | INFO | mm_projector (mlp_downsample)  Sequential(
  (0): DownSampleBlock()
  (1): LayerNorm((4608,), eps=1e-05, elementwise_affine=True)
  (2): Linear(in_features=4608, out_features=2560, bias=True)
  (3): GELU(approximate='none')
  (4): Linear(in_features=2560, out_features=2560, bias=True)
11:19:07 | INFO | mm_projector weights:  dict_keys(['1.bias', '1.weight', '2.bias', '2.weight', '4.bias', '4.weight'])
│ _name_or_path              │ ./llm                                                                       │
│ architectures              │ ['LlamaForCausalLM']                                                        │
│ drop_path_rate             │ 0.0                                                                         │
│ hidden_size                │ 2560                                                                        │
│ image_aspect_ratio         │ resize                                                                      │
│ interpolate_mode           │ linear                                                                      │
│ mm_hidden_size             │ 1152                                                                        │
│ mm_projector_lr            │                                                                             │
│ mm_use_im_patch_token      │ False                                                                       │
│ mm_use_im_start_end        │ False                                                                       │
│ mm_vision_select_feature   │ cls_patch                                                                   │
│ mm_vision_select_layer     │ -2                                                                          │
│ model_dtype                │ torch.bfloat16                                                              │
│ model_type                 │ llama                                                                       │
│ num_video_frames           │ 8                                                                           │
│ resume_path                │ ./vlm                                                                       │
│ s2                         │ False                                                                       │
│ s2_max_split_size          │ 336                                                                         │
│ s2_scales                  │ 336,672,1008                                                                │
│ transformers_version       │ 4.36.2                                                                      │
│ tune_language_model        │ True                                                                        │
│ tune_mm_projector          │ True                                                                        │
│ tune_vision_tower          │ True                                                                        │
│ vision_resolution          │ -1                                                                          │
│ name                       │ VILA1.5-3b                                                                  │
│ api                        │ mlc                                                                         │
│ max_position_embeddings    │ 4096                                                                        │
│ mm_vision_tower            │ /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshot │
│ mm_projector_path          │ /data/models/huggingface/models--Efficient-Large-Model--VILA1.5-3b/snapshot │
│ mm_projector_type          │ mlp_downsample                                                              │
│ attention_bias             │ False                                                                       │
│ attention_dropout          │ 0.0                                                                         │
│ bos_token_id               │ 1                                                                           │
│ eos_token_id               │ 2                                                                           │
│ hidden_act                 │ silu                                                                        │
│ initializer_range          │ 0.02                                                                        │
│ intermediate_size          │ 6912                                                                        │
│ model_max_length           │ 4096                                                                        │
│ num_attention_heads        │ 20                                                                          │
│ num_hidden_layers          │ 32                                                                          │
│ num_key_value_heads        │ 20                                                                          │
│ pad_token_id               │ 0                                                                           │
│ pretraining_tp             │ 1                                                                           │
│ rms_norm_eps               │ 1e-05                                                                       │
│ rope_scaling               │                                                                             │
│ rope_theta                 │ 10000.0                                                                     │
│ tie_word_embeddings        │ False                                                                       │
│ tokenizer_model_max_length │ 4096                                                                        │
│ tokenizer_padding_side     │ right                                                                       │
│ torch_dtype                │ bfloat16                                                                    │
│ use_cache                  │ True                                                                        │
│ vocab_size                 │ 32000                                                                       │
│ quant                      │ q4f16_ft                                                                    │
│ type                       │ llama                                                                       │
│ max_length                 │ 256                                                                         │
│ prefill_chunk_size         │ -1                                                                          │
│ load_time                  │ 36.75651084200035                                                           │
│ params_size                │ 1300.8330078125                                                             │

11:19:07 | INFO | using chat template 'vicuna-v1' for model VILA1.5-3b
11:19:07 | INFO | model 'VILA1.5-3b', chat template 'vicuna-v1' stop tokens:  ['</s>'] -> [2]
11:19:07 | INFO | Warming up LLM with query 'What is 2+2?'
11:19:08 | INFO | Warmup response:  '4</s>'
11:19:08 | INFO | plugin | connected PrintStream to on_text on channel 0
11:19:08 | INFO | plugin | connected ChatQuery to PrintStream on channel 0
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

(gst-plugin-scanner:75): GLib-GObject-WARNING **: 11:19:09.431: cannot register existing type 'GstRtpSrc'

(gst-plugin-scanner:75): GLib-GObject-CRITICAL **: 11:19:09.431: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed

(gst-plugin-scanner:75): GLib-CRITICAL **: 11:19:09.431: g_once_init_leave: assertion 'result != 0' failed

(gst-plugin-scanner:75): GStreamer-CRITICAL **: 11:19:09.431: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed

(gst-plugin-scanner:75): GLib-GObject-WARNING **: 11:19:09.431: cannot register existing type 'GstRtpSink'

(gst-plugin-scanner:75): GLib-GObject-CRITICAL **: 11:19:09.432: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed

(gst-plugin-scanner:75): GLib-CRITICAL **: 11:19:09.432: g_once_init_leave: assertion 'result != 0' failed

(gst-plugin-scanner:75): GStreamer-CRITICAL **: 11:19:09.432: gst_element_register: assertion 'g_type_is_a (type, GST_TYPE_ELEMENT)' failed
sh: 1: lsmod: not found
sh: 1: modprobe: not found
[gstreamer] initialized gstreamer, version
[gstreamer] gstCamera -- attempting to create device v4l2:///dev/video0
[gstreamer] gstCamera -- didn't discover any v4l2 devices
[gstreamer] gstCamera -- device discovery failed, but /dev/video0 exists
[gstreamer]              support for compressed formats is disabled
[gstreamer] gstCamera pipeline string:
[gstreamer] v4l2src device=/dev/video0 do-timestamp=true ! nvv4l2decoder name=decoder enable-max-performance=1 ! video/x-raw(memory:NVMM) ! nvvidconv flip-method=0 ! video/x-raw ! appsink name=mysink sync=false
sh: 1: lsmod: not found
sh: 1: modprobe: not found
[gstreamer] gstCamera successfully created device v4l2:///dev/video0
[video]  created gstCamera from v4l2:///dev/video0
gstCamera video options:
  -- URI: v4l2:///dev/video0
     - protocol:  v4l2
     - location:  /dev/video0
  -- deviceType: v4l2
  -- ioType:     input
  -- codec:      unknown
  -- codecType:  v4l2
  -- width:      1280
  -- height:     720
  -- frameRate:  30
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- sslCert     /etc/ssl/private/localhost.cert.pem
  -- sslKey      /etc/ssl/private/localhost.key.pem
[gstreamer] gstEncoder -- codec not specified, defaulting to H.264
failed to find/open file /proc/device-tree/model
[gstreamer] gstEncoder -- detected board 'NVIDIA Jetson Orin Nano Engineering Reference Developer Kit Super'
[gstreamer] gstEncoder -- hardware encoder not detected, reverting to CPU encoder
[gstreamer] gstEncoder -- pipeline launch string:
[gstreamer] appsrc name=mysource is-live=true do-timestamp=true format=3 ! x264enc name=encoder bitrate=4000 speed-preset=ultrafast tune=zerolatency key-int-max=30 insert-vui=1 ! video/x-h264 ! rtph264pay config-interval=1 ! application/x-rtp,media=video,encoding-name=H264,clock-rate=90000,payload=96 ! tee name=videotee ! queue ! fakesink
[webrtc] WebRTC server started @ https://lcmo-desktop:8554
[webrtc] WebRTC server thread running...
[webrtc] websocket route added /output
[video]  created gstEncoder from webrtc://@:8554/output
gstEncoder video options:
  -- URI: webrtc://@:8554/output
     - protocol:  webrtc
     - location:
     - port:      8554
  -- deviceType: ip
  -- ioType:     output
  -- codec:      H264
  -- codecType:  cpu
  -- frameRate:  30
  -- bitRate:    4000000
  -- numBuffers: 4
  -- zeroCopy:   true
  -- latency     10
  -- sslCert     /etc/ssl/private/localhost.cert.pem
  -- sslKey      /etc/ssl/private/localhost.key.pem
11:19:10 | INFO | plugin | connected VideoSource to on_video on channel 0
11:19:11 | INFO | mounting webserver path /data/datasets/uploads to /images/uploads
[gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING
11:19:11 | INFO | starting webserver @
11:19:11 | SUCCESS | VideoQuery - system ready
 * Serving Flask app 'nano_llm.web.server'
 * Debug mode: on
11:19:11 | INFO | WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (
 * Running on
 * Running on
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> nvvconv0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> decoder
11:19:11 | INFO | Press CTRL+C to quit
[gstreamer] gstreamer changed state from NULL to READY ==> v4l2src0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> nvvconv0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer changed state from READY to PAUSED ==> decoder
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> v4l2src0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> nvvconv0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> decoder
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> v4l2src0
[gstreamer] gstreamer message stream-start ==> pipeline0
[gstreamer] gstCamera -- end of stream (EOS)
[gstreamer] gstreamer v4l2src0 ERROR Internal data stream error.
[gstreamer] gstreamer Debugging info: ../libs/gst/base/gstbasesrc.c(3127): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming stopped, reason not-negotiated (-4)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer message latency ==> mysink
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:13 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:16 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:18 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:21 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:23 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:26 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:28 | WARNING | video source /dev/video0 timed out during capture, re-trying...
[gstreamer] gstCamera::Capture() -- a timeout occurred waiting for the next image buffer
11:19:31 | WARNING | video source /dev/video0 timed out during capture, re-trying...
11:19:31 | ERROR | Re-initializing video source "/dev/video0"
[gstreamer] gstCamera -- stopping pipeline, transitioning to GST_STATE_NULL
[gstreamer] gstCamera -- pipeline stopped
[gstreamer] gstCamera -- attempting to create device v4l2:///dev/video0
Available Sensor modes :
Resolution: 3280 x 2464 ; Framerate = 21.000000; Analog Gain Range Min 1.000000, Max 10.625000, Exposure Range Min 13000, Max 683709000

Resolution: 3280 x 1848 ; Framerate = 28.000001; Analog Gain Range Min 1.000000, Max 10.625000, Exposure Range Min 13000, Max 683709000

Resolution: 1920 x 1080 ; Framerate = 29.999999; Analog Gain Range Min 1.000000, Max 10.625000, Exposure Range Min 13000, Max 683709000

Resolution: 1640 x 1232 ; Framerate = 29.999999; Analog Gain Range Min 1.000000, Max 10.625000, Exposure Range Min 13000, Max 683709000

Resolution: 1280 x 720 ; Framerate = 59.999999; Analog Gain Range Min 1.000000, Max 10.625000, Exposure Range Min 13000, Max 683709000

[gstreamer] gstCamera -- found v4l2 device: NvV4L2 Argus PLugin
[gstreamer] v4l2-proplist, device.path=(string)/dev/video0, udev-probed=(boolean)false, device.api=(string)v4l2, v4l2.device.driver=(string)"\ \(multi-NvV4L2\ Argus\ PLugin", v4l2.device.card=(string)"NvV4L2\ Argus\ PLugin", v4l2.device.bus_info=(string)platform:NV-ARGUS:1.000000, v4l2.device.version=(uint)0, v4l2.device.capabilities=(uint)2216693760, v4l2.device.device_caps=(uint)69210112;
[gstreamer] gstCamera -- found 2 caps for v4l2 device /dev/video0
[gstreamer] [0] video/x-raw, format=(string)NV12, width=(int)[ 48, 3280 ], height=(int)[ 48, 2464 ], framerate=(fraction)[ 0/1, 2147483647/1 ];
[gstreamer] [1] video/x-raw, format=(string)NV12, width=(int)[ 48, 3280 ], height=(int)[ 48, 2464 ], framerate=(fraction)[ 0/1, 2147483647/1 ], interlace-mode=(string)alternate;
[gstreamer] gstCamera -- couldn't find a compatible codec/format for v4l2 device /dev/video0
[gstreamer] gstCamera -- device discovery failed, but /dev/video0 exists
[gstreamer]              support for compressed formats is disabled
[gstreamer] gstCamera pipeline string:
[gstreamer] v4l2src device=/dev/video0 do-timestamp=true ! nvv4l2decoder name=decoder enable-max-performance=1 ! video/x-raw(memory:NVMM) ! nvvidconv flip-method=0 ! video/x-raw ! appsink name=mysink sync=false
[gstreamer] gstCamera successfully created device v4l2:///dev/video0
[video]  created gstCamera from v4l2:///dev/video0

And they just repeat.
I would like to know how to solve the problem.
This is my nvidia-jetpack info:

$ sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Source: nvidia-jetpack (6.2)
Version: 6.2+b77
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.2+b77), nvidia-jetpack-dev (= 6.2+b77)
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.2+b77_arm64.deb
Size: 29298
SHA256: 70553d4b5a802057f9436677ef8ce255db386fd3b5d24ff2c0a8ec0e485c59cd
SHA1: 9deab64d12eef0e788471e05856c84bf2a0cf6e6
MD5sum: 4db65dc36434fe1f84176843384aee23
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack
Source: nvidia-jetpack (6.1)
Version: 6.1+b123
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 6.1+b123), nvidia-jetpack-dev (= 6.1+b123)
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_6.1+b123_arm64.deb
Size: 29312
SHA256: b6475a6108aeabc5b16af7c102162b7c46c36361239fef6293535d05ee2c2929
SHA1: f0984a6272c8f3a70ae14cb2ca6716b8c1a09543
MD5sum: a167745e1d88a8d7597454c8003fa9a4
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

I don’t have background in Computer science nor other related engineering knowledge. Please let me know what further information I have to provide to let you look into the issue. Thanks a lot!


Could you try below commands and share the logs with us to review?

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input $(readlink /dev/video0) \
    --video-output webrtc://@:8554/output \



Thanks for your reply, the following is the logs after running the commands above:

$ jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input $(readlink /dev/video0) \
    --video-output webrtc://@:8554/output \
Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- Finding compatible container image for ['nano_llm']
V4L2_DEVICES:  --device /dev/video0 
### DISPLAY environmental variable is already set: ":0"
localuser:root being added to access control list
+ docker run --runtime nvidia -it --rm --network host --shm-size=8g --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/lcmo/jetson-containers/data:/data -v /etc/localtime:/etc/localtime:ro -v /etc/timezone:/etc/timezone:ro --device /dev/snd -e PULSE_SERVER=unix:/run/user/1000/pulse/native -v /run/user/1000/pulse:/run/user/1000/pulse --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 --name jetson_container_20250204_161554 dustynv/nano_llm:r36.4.0 python3 -m nano_llm.agents.video_query --api=mlc --model Efficient-Large-Model/VILA1.5-3b --max-context-len 256 --max-new-tokens 32 --video-input --video-output webrtc://@:8554/output --vision-api=hf
/usr/local/lib/python3.10/dist-packages/transformers/utils/ FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
usage: [-h] [--model MODEL] [--quantization QUANTIZATION]
                      [--api {auto_gptq,awq,hf,mlc}]
                      [--vision-api {auto,hf,trt}]
                      [--vision-model VISION_MODEL]
                      [--vision-scaling {crop,resize}] [--prompt [PROMPT ...]]
                      [--save-mermaid SAVE_MERMAID]
                      [--chat-template {llama-2,llama-3,llama-3.1,tiny-llama,sheared-llama,open-llama,vicuna-v0,vicuna-v1,chat-ml,chat-ml-tools,nous-obsidian,stablelm-zephyr,phi-2-chat,phi-2-instruct,gemma,bunny,openvla,llava-v0,llava-v1,llava-llama-2}]
                      [--system-prompt SYSTEM_PROMPT]
                      [--wrap-tokens WRAP_TOKENS]
                      [--max-context-len MAX_CONTEXT_LEN]
                      [--max-new-tokens MAX_NEW_TOKENS]
                      [--min-new-tokens MIN_NEW_TOKENS] [--do-sample]
                      [--temperature TEMPERATURE] [--top-p TOP_P]
                      [--repetition-penalty REPETITION_PENALTY]
                      [--video-input VIDEO_INPUT]
                      [--video-input-width VIDEO_INPUT_WIDTH]
                      [--video-input-height VIDEO_INPUT_HEIGHT]
                      [--video-input-codec {h264,h265,vp8,vp9,mjpeg}]
                      [--video-input-framerate VIDEO_INPUT_FRAMERATE]
                      [--video-input-save VIDEO_INPUT_SAVE]
                      [--video-output VIDEO_OUTPUT]
                      [--video-output-codec {h264,h265,vp8,vp9,mjpeg}]
                      [--video-output-bitrate VIDEO_OUTPUT_BITRATE]
                      [--video-output-save VIDEO_OUTPUT_SAVE]
                      [--nanodb NANODB] [--nanodb-model NANODB_MODEL]
                      [--nanodb-reserve NANODB_RESERVE] [--web-host WEB_HOST]
                      [--web-port WEB_PORT] [--ws-port WS_PORT]
                      [--ssl-key SSL_KEY] [--ssl-cert SSL_CERT]
                      [--upload-dir UPLOAD_DIR] [--web-trace]
                      [--web-title WEB_TITLE]
                      [--log-level {debug,info,warning,error,critical}]
                      [--debug] error: argument --video-input: expected one argument

Hi all,

I have googled a lot and found that I may need to do something since I am using a CSI camera.

I found something useful here: LeRobot - NVIDIA Jetson AI Lab
I run the command:

sudo apt update && sudo apt install v4l2loopback-dkms v4l-utils

Then, I change --video-input /dev/video0 \ to --video-input /dev/video1 \ and add --csi2webcam after run
The whole commands:

jetson-containers run --csi2webcam $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video1 \
    --video-output webrtc://@:8554/output \

This is working.

But I try to visit https://<IP_ADDRESS>:8050, I can see the webui but the live video is missing. Can I solve this problem on Orin Nano? Thanks!!


Do your try
Or your device ip address



I try to visit in the Jetson Orin Nano, the live video is not played.

I also try to visit https://<MY_IP_ADDRESS>:8050 in my PC via the same network, but it crashed and gave me this logs:

18:14:37 | INFO | refresh rate:  1.24 FPS (808.0 ms)
A man is taking a selfie in a room.18:14:37 | INFO | - - [04/Feb/2025 18:14:37] "GET / HTTP/1.1" 200 -

18:14:37 | INFO | refresh rate:  1.17 FPS (855.0 ms)
A man is18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/chat.css HTTP/1.1" 200 -
 taking18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/bootstrap.css HTTP/1.1" 200 -
18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/select2.min.css HTTP/1.1" 200 -
 a18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/webrtc.js HTTP/1.1" 200 -
 self18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/rest.js HTTP/1.1" 200 -
ie18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/debounce.js HTTP/1.1" 200 -
18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/websocket.js HTTP/1.1" 200 -
 in18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/jquery-3.6.3.min.js HTTP/1.1" 200 -
 a18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/bootstrap.bundle.min.js HTTP/1.1" 200 -
18:14:38 | INFO | - - [04/Feb/2025 18:14:38] "GET /static/select2.min.js HTTP/1.1" 200 -
18:14:38 | INFO | refresh rate:  1.16 FPS (863.7 ms)
18:14:38 | INFO | connection open
18:14:38 | INFO | new websocket connection from ('', 51220)
18:14:38 | INFO | listening on websocket connection from ('', 51220)
[webrtc] websocket /output -- new connection opened by (peer_id=0)
[webrtc] new WebRTC peer connecting (, peer_id=0)
ERROR:/opt/jetson-utils/codec/gstEncoder.cpp:876:static void gstEncoder::onWebsocketMessage(WebRTCPeer*, const char*, size_t, void*): 'sinkpad' should not be nullptr
Bail out! ERROR:/opt/jetson-utils/codec/gstEncoder.cpp:876:static void gstEncoder::onWebsocketMessage(WebRTCPeer*, const char*, size_t, void*): 'sinkpad' should not be nullptr
Fatal Python error: Aborted

Thread 0x0000fffdda74f120 (most recent call first):
  File "/usr/lib/python3.10/", line 1161 in read
  File "/usr/lib/python3.10/", line 1288 in recv
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 561 in recv_events
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 196 in recv_events
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000fffddb76f120 (most recent call first):
  File "/usr/lib/python3.10/", line 320 in wait
  File "/usr/lib/python3.10/", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 96 in get
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 207 in recv
  File "/opt/NanoLLM/nano_llm/web/", line 335 in websocket_listener
  File "/opt/NanoLLM/nano_llm/web/", line 314 in on_websocket
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 575 in conn_handler
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff2c32f120 (most recent call first):
  File "/usr/lib/python3.10/", line 416 in select
  File "/usr/lib/python3.10/", line 232 in serve_forever
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/", line 817 in serve_forever
  File "/usr/local/lib/python3.10/dist-packages/werkzeug/", line 1123 in run_simple
  File "/usr/local/lib/python3.10/dist-packages/flask/", line 625 in run
  File "/opt/NanoLLM/nano_llm/web/", line 120 in <lambda>
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff2d34f120 (most recent call first):
  File "/usr/lib/python3.10/", line 469 in select
  File "/usr/local/lib/python3.10/dist-packages/websockets/sync/", line 260 in serve_forever
  File "/opt/NanoLLM/nano_llm/web/", line 119 in <lambda>
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff3b3cf120 (most recent call first):
  File "/opt/NanoLLM/nano_llm/plugins/video/", line 109 in capture
  File "/opt/NanoLLM/nano_llm/plugins/video/", line 159 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff38bbf120 (most recent call first):
  File "/opt/NanoLLM/nano_llm/agents/", line 313 in poll_keyboard
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff363af120 (most recent call first):
  File "/usr/lib/python3.10/", line 324 in wait
  File "/usr/lib/python3.10/", line 607 in wait
  File "/opt/NanoLLM/nano_llm/", line 335 in process_inputs
  File "/opt/NanoLLM/nano_llm/", line 321 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000fffe277ef120 (most recent call first):
  File "/usr/lib/python3.10/", line 324 in wait
  File "/usr/lib/python3.10/", line 607 in wait
  File "/opt/NanoLLM/nano_llm/", line 335 in process_inputs
  File "/opt/NanoLLM/nano_llm/", line 321 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000fffe27fff120 (most recent call first):
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/", line 2576 in layer_norm
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 202 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/siglip/", line 439 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/siglip/", line 671 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/siglip/", line 858 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/siglip/", line 957 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/opt/clip_trt/clip_trt/", line 115 in forward
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1562 in _call_impl
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/", line 1553 in _wrapped_call_impl
  File "/opt/clip_trt/clip_trt/", line 218 in embed_image
  File "/opt/clip_trt/clip_trt/", line 246 in __call__
  File "/opt/NanoLLM/nano_llm/", line 267 in embed_image
  File "/opt/NanoLLM/nano_llm/chat/", line 256 in _embed_image
  File "/opt/NanoLLM/nano_llm/chat/", line 215 in embed
  File "/opt/NanoLLM/nano_llm/chat/", line 369 in embed_chat
  File "/opt/NanoLLM/nano_llm/plugins/", line 189 in process
  File "/opt/NanoLLM/nano_llm/plugins/", line 152 in process
  File "/opt/NanoLLM/nano_llm/", line 361 in dispatch
  File "/opt/NanoLLM/nano_llm/", line 348 in process_inputs
  File "/opt/NanoLLM/nano_llm/", line 321 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000fffe6cdff120 (most recent call first):
  File "/usr/lib/python3.10/", line 320 in wait
  File "/usr/lib/python3.10/", line 171 in get
  File "/opt/NanoLLM/nano_llm/models/", line 537 in _run
  File "/usr/lib/python3.10/", line 953 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000fffee17af120 (most recent call first):
  File "/usr/lib/python3.10/", line 324 in wait
  File "/usr/lib/python3.10/", line 607 in wait
  File "/usr/local/lib/python3.10/dist-packages/tqdm/", line 60 in run
  File "/usr/lib/python3.10/", line 1016 in _bootstrap_inner
  File "/usr/lib/python3.10/", line 973 in _bootstrap

Thread 0x0000ffff90dc96c0 (most recent call first):
  File "/usr/lib/python3.10/", line 1116 in _wait_for_tstate_lock
  File "/usr/lib/python3.10/", line 1096 in join
  File "/opt/NanoLLM/nano_llm/", line 58 in run
  File "/opt/NanoLLM/nano_llm/agents/", line 357 in <module>
  File "/usr/lib/python3.10/", line 86 in _run_code
  File "/usr/lib/python3.10/", line 196 in _run_module_as_main

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, zstandard.backend_c,, yaml._yaml, sentencepiece._sentencepiece, psutil._psutil_linux, psutil._psutil_posix, PIL._imaging, PIL._imagingft, google.protobuf.pyext._message, jetson_utils_python, cuda._lib.utils, cuda._cuda.ccuda, cuda.ccuda, cuda.cuda, cuda._cuda.cnvrtc, cuda.cnvrtc, cuda.nvrtc, cuda._lib.ccudart.utils, cuda._lib.ccudart.ccudart, cuda.ccudart, cuda.cudart, _cffi_backend, pyaudio._portaudio, markupsafe._speedups, websockets.speedups, regex._regex, scipy._lib._ccallback_c, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.runtime._nrt_python,, numba.experimental.jitclass._box, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps,, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, numexpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, tvm._ffi._cy3.core (total: 145)



Do you disable chrome://flags#enable-webrtc-hide-local-ips-with-mdns ?



I didn’t. But I was using Firefox in the Jetson Orin Nano.

I try to visit https://<MY_IP_ADDRESS>:8050 in my MacBook using Safari via the same network. The problem remains. There is no live video playing.

Is there anything I am missing? Thanks.

I also tried NanoOWL, there is also no video playing.

I am buying a USB camera. Hope this is just the problem of my CSI camera.


Could you change webrtc to rtp

some example like

jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output rtp://@:8554/output \

Some useful link you could refer


Hi David,

Thanks. I tried to replace webrtc by rtp but the issue remains.


There are some users sharing workaround in dusty-nv/jetson-utils#185 (comment).

Please execute below command in container and rerun with webrtc type

apt install -y gstreamer1.0-nice

It’s verified in our side.


Hi David,

I run the command and try again, there is still no video playing in the WebUI. I think this maybe the problem of my IMX219 having driver problem. I will try again when a USB camera arrive to me and report asap.

I run the following commands instead:

 jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input csi://0 \
    --video-output  display://0 \

I change the video input to csi://0 and output to display://0

Now I can have a pop-up window playing the live video and showing the prompt and output of the VLM. But a new problem raised, which is the output of the VLM is sometime a little bit delayed. Sometime the description to a frame keeps for maybe 10 seconds, but I can observe the video is on live. Is it because the Orin Nano doesn’t have enough computational power? Or another problem I am encountering?



Yes, we verify with the usb camera.

Do you enable the super mode and execute jetson_clocks to maximize the frequency?



Yes, I did enable the super mode and execute sudo jetson_clocks.
The video playing is smooth, but the description sometime is telling what happened 10 seconds before.
I don’t if it is also the problem of my CSI camera.



Try to do with USB camera and reply whether the issue exists.
