Please provide the following information when creating a topic:
- Hardware Platform (GPU model and numbers): Jetson Thor
- Ubuntu Version: 24.04.3
- NVIDIA GPU Driver Version (valid for GPU only): 580.00
- Jetpack version: 7.0
- CUDA version: 13.0
Hello, I’m attempting to deploy NVILA model using docker remote_llm_deployment on a Jetson Thor, but I’m getting these errors:
`
via-server-1 | GPU has 2 decode engines
via-server-1 | Free GPU memory is [N/A] MiB
via-server-1 | /opt/nvidia/via/start_via.sh: line 77: [: [N/A]: integer expression expected
via-server-1 | Total GPU memory is 125772 MiB per GPU
via-server-1 | Auto-selecting VLM Batch Size to 128
via-server-1 | Using nvila
via-server-1 | Starting VIA server in release mode
via-server-1 | 2026-01-20 14:53:53,240 INFO Initializing VIA Stream Handler
via-server-1 | 2026-01-20 14:53:53,243 INFO {‘gdino_engine’: ‘/root/.via/ngc_model_cache//cv_pipeline_models/swin.fp16.engine’, ‘tracker_config’: ‘/tmp/via_tracker_config.yml’, ‘inference_interval’: 1}
via-server-1 | 2026-01-20 14:53:53,243 INFO Initializing VLM pipeline
via-server-1 | 2026-01-20 14:53:53,252 INFO Have peer access: True
via-server-1 | 2026-01-20 14:53:53,253 INFO Using model cached at /root/.via/ngc_model_cache/nvidia_tao_nvila-highres_nvila-lite-15b-highres-lita
via-server-1 | 2026-01-20 14:53:53,254 INFO GPUs per VLM instance: 1
via-server-1 | 2026-01-20 14:53:53,254 INFO num_vlm_procs set to 1
via-server-1 | INFO: Started server process [160]
via-server-1 | INFO: Waiting for application startup.
via-server-1 | INFO: Application startup complete.
via-server-1 | INFO: Uvicorn running on http://127.0.0.1:60000 (Press CTRL+C to quit)
via-server-1 | 2026-01-20 14:53:55,695 INFO Initializing VlmProcess-0
via-server-1 | 2026-01-20 14:53:55,695 INFO Initializing DecoderProcess-0
via-server-1 | Process VlmProcess-1:
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
via-server-1 | self.run()
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/process_base.py”, line 240, in run
via-server-1 | if not self._initialize():
via-server-1 | ^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py”, line 730, in _initialize
via-server-1 | self._model = NVila(
via-server-1 | ^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/models/nvila/nvila_model.py”, line 44, in init
via-server-1 | import tensorrt_llm
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/init.py”, line 32, in
via-server-1 | import tensorrt_llm.functional as functional
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/functional.py”, line 28, in
via-server-1 | from . import graph_rewriting as gw
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/graph_rewriting.py”, line 11, in
via-server-1 | from ._utils import trt_gte
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_utils.py”, line 42, in
via-server-1 | from tensorrt_llm.bindings import DataType, GptJsonConfig
via-server-1 | ImportError: libcublas.so.12: cannot open shared object file: No such file or directory
via-server-1 | 2026-01-20 14:53:58,787 INFO Warmup DecoderProcess-0
via-server-1 | /bin/dash: 1: lsmod: not found
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,150 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,498 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,748 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,982 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:54:00,220 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:54:00,462 INFO Video stream found.
via-server-1 | 2026-01-20 14:54:00,557 INFO Warmup DecoderProcess-0 done
via-server-1 | 2026-01-20 14:54:00,559 INFO Initialized DecoderProcess-0
via-server-1 | 2026-01-20 14:54:01,561 INFO Stopping VLM pipeline
via-server-1 | [W120 14:54:02.704658614 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
via-server-1 | 2026-01-20 14:54:02,778 INFO Stopped VLM pipeline
via-server-1 | 2026-01-20 14:54:02,779 ERROR Failed to load VIA stream handler - Failed to load VLM on GPU 0
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 254, in run
via-server-1 | self._stream_handler = ViaStreamHandler(self._args)
via-server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/via_stream_handler.py”, line 592, in init
via-server-1 | self._vlm_pipeline = VlmPipeline(args.asset_dir, args)
via-server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py”, line 1560, in init
via-server-1 | raise Exception(f"Failed to load VLM on GPU {idx}“)
via-server-1 | Exception: Failed to load VLM on GPU 0
via-server-1 |
via-server-1 | During handling of the above exception, another exception occurred:
via-server-1 |
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 2371, in
via-server-1 | server.run()
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 256, in run
via-server-1 | raise ViaException(f"Failed to load VIA stream handler - {str(ex)}”)
via-server-1 | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - Failed to load VLM on GPU 0
via-server-1 | Killed process with PID 157
via-server-1 exited with code 1
`
What I’ve tried:
Both via-server images (non-sbsa and sbsa)
setting NVILA_USE_PYTORCH=1
Is there anything else I could try?