VSS Jetson Thor NVILA deployment libcublas.so.12 import error

AndreiFranca · January 20, 2026, 3:01pm

Please provide the following information when creating a topic:

Hardware Platform (GPU model and numbers): Jetson Thor
Ubuntu Version: 24.04.3
NVIDIA GPU Driver Version (valid for GPU only): 580.00
Jetpack version: 7.0
CUDA version: 13.0

Hello, I’m attempting to deploy NVILA model using docker remote_llm_deployment on a Jetson Thor, but I’m getting these errors:

`

via-server-1 | GPU has 2 decode engines
via-server-1 | Free GPU memory is [N/A] MiB
via-server-1 | /opt/nvidia/via/start_via.sh: line 77: [: [N/A]: integer expression expected
via-server-1 | Total GPU memory is 125772 MiB per GPU
via-server-1 | Auto-selecting VLM Batch Size to 128
via-server-1 | Using nvila
via-server-1 | Starting VIA server in release mode
via-server-1 | 2026-01-20 14:53:53,240 INFO Initializing VIA Stream Handler
via-server-1 | 2026-01-20 14:53:53,243 INFO {‘gdino_engine’: ‘/root/.via/ngc_model_cache//cv_pipeline_models/swin.fp16.engine’, ‘tracker_config’: ‘/tmp/via_tracker_config.yml’, ‘inference_interval’: 1}
via-server-1 | 2026-01-20 14:53:53,243 INFO Initializing VLM pipeline
via-server-1 | 2026-01-20 14:53:53,252 INFO Have peer access: True
via-server-1 | 2026-01-20 14:53:53,253 INFO Using model cached at /root/.via/ngc_model_cache/nvidia_tao_nvila-highres_nvila-lite-15b-highres-lita
via-server-1 | 2026-01-20 14:53:53,254 INFO GPUs per VLM instance: 1
via-server-1 | 2026-01-20 14:53:53,254 INFO num_vlm_procs set to 1
via-server-1 | INFO: Started server process [160]
via-server-1 | INFO: Waiting for application startup.
via-server-1 | INFO: Application startup complete.
via-server-1 | INFO: Uvicorn running on http://127.0.0.1:60000 (Press CTRL+C to quit)
via-server-1 | 2026-01-20 14:53:55,695 INFO Initializing VlmProcess-0
via-server-1 | 2026-01-20 14:53:55,695 INFO Initializing DecoderProcess-0
via-server-1 | Process VlmProcess-1:
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
via-server-1 | self.run()
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/process_base.py”, line 240, in run
via-server-1 | if not self._initialize():
via-server-1 | ^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py”, line 730, in _initialize
via-server-1 | self._model = NVila(
via-server-1 | ^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/models/nvila/nvila_model.py”, line 44, in init
via-server-1 | import tensorrt_llm
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/init.py”, line 32, in
via-server-1 | import tensorrt_llm.functional as functional
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/functional.py”, line 28, in
via-server-1 | from . import graph_rewriting as gw
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/graph_rewriting.py”, line 11, in
via-server-1 | from ._utils import trt_gte
via-server-1 | File “/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_utils.py”, line 42, in
via-server-1 | from tensorrt_llm.bindings import DataType, GptJsonConfig
via-server-1 | ImportError: libcublas.so.12: cannot open shared object file: No such file or directory
via-server-1 | 2026-01-20 14:53:58,787 INFO Warmup DecoderProcess-0
via-server-1 | /bin/dash: 1: lsmod: not found
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,150 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,498 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,748 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:53:59,982 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:54:00,220 INFO Video stream found.
via-server-1 | Opening in BLOCKING MODE
via-server-1 | 2026-01-20 14:54:00,462 INFO Video stream found.
via-server-1 | 2026-01-20 14:54:00,557 INFO Warmup DecoderProcess-0 done
via-server-1 | 2026-01-20 14:54:00,559 INFO Initialized DecoderProcess-0
via-server-1 | 2026-01-20 14:54:01,561 INFO Stopping VLM pipeline
via-server-1 | [W120 14:54:02.704658614 CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
via-server-1 | 2026-01-20 14:54:02,778 INFO Stopped VLM pipeline
via-server-1 | 2026-01-20 14:54:02,779 ERROR Failed to load VIA stream handler - Failed to load VLM on GPU 0
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 254, in run
via-server-1 | self._stream_handler = ViaStreamHandler(self._args)
via-server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/via_stream_handler.py”, line 592, in init
via-server-1 | self._vlm_pipeline = VlmPipeline(args.asset_dir, args)
via-server-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
via-server-1 | File “/opt/nvidia/via/via-engine/vlm_pipeline/vlm_pipeline.py”, line 1560, in init
via-server-1 | raise Exception(f"Failed to load VLM on GPU {idx}“)
via-server-1 | Exception: Failed to load VLM on GPU 0
via-server-1 |
via-server-1 | During handling of the above exception, another exception occurred:
via-server-1 |
via-server-1 | Traceback (most recent call last):
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 2371, in
via-server-1 | server.run()
via-server-1 | File “/opt/nvidia/via/via-engine/via_server.py”, line 256, in run
via-server-1 | raise ViaException(f"Failed to load VIA stream handler - {str(ex)}”)
via-server-1 | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - Failed to load VLM on GPU 0
via-server-1 | Killed process with PID 157
via-server-1 exited with code 1

`

What I’ve tried:
Both via-server images (non-sbsa and sbsa)
setting NVILA_USE_PYTORCH=1

Is there anything else I could try?

junshengy · January 21, 2026, 10:23am

Please refer to this patch.

1.Since trt-llm does not support Jetson Thor, we can perform vllm inference using pytorch.

diff --git a/deploy/docker/remote_llm_deployment/.env b/deploy/docker/remote_llm_deployment/.env
index 97e849e..e0e17d2 100644
--- a/deploy/docker/remote_llm_deployment/.env
+++ b/deploy/docker/remote_llm_deployment/.env
@@ -1,5 +1,5 @@
-export NGC_API_KEY=abc123*** #FIXME - api key to pull model from NGC. Should come from ngc.nvidia.com
-export NVIDIA_API_KEY=nvapi-*** #api key to access NIM endpoints. Should come from build.nvidia.com
+export NGC_API_KEY=nvapi #FIXME - api key to pull model from NGC. Should come from ngc.nvidia.com
+export NVIDIA_API_KEY=nvapi #api key to access NIM endpoints. Should come from build.nvidia.com
 
 #Adjust ports if needed
 export FRONTEND_PORT=9100
@@ -27,10 +27,17 @@ export DISABLE_CV_PIPELINE=true
 export INSTALL_PROPRIETARY_CODECS=false # Set to true when enabling CV
 
 #Set VLM to Cosmos-Reason1
-export VLM_MODEL_TO_USE=cosmos-reason1
-export MODEL_PATH=ngc:nim/nvidia/cosmos-reason1-7b:1.1-fp8-dynamic
+# export VLM_MODEL_TO_USE=cosmos-reason1
+# export MODEL_PATH=ngc:nim/nvidia/cosmos-reason1-7b:1.1-fp8-dynamic
+
+export VLM_MODEL_TO_USE=nvila
+export MODEL_PATH=git:https://huggingface.co/Efficient-Large-Model/NVILA-15B
+export NVILA_VIDEO_MAX_TILES=1
+export NVILA_USE_PYTORCH=true
 
 #Adjust misc configs if needed
 export DISABLE_GUARDRAILS=false
 export NVIDIA_VISIBLE_DEVICES=0 #For H100 Deployment

# cache model to host, avoid download again and again
+export NGC_MODEL_CACHE=xxxx/.cache/.vss/ngc_model_cache
+export TRT_ENGINE_PATH=xxxx/.cache/.vss/trt_engine_cache
 #export NVIDIA_VISIBLE_DEVICES=0,1,2 #For L40S Deployment
diff --git a/deploy/docker/remote_llm_deployment/compose.yaml b/deploy/docker/remote_llm_deployment/compose.yaml
index 1a13613..ab0ef34 100644
--- a/deploy/docker/remote_llm_deployment/compose.yaml
+++ b/deploy/docker/remote_llm_deployment/compose.yaml
@@ -39,8 +39,10 @@ services:
       - via-hf-cache:/tmp/huggingface
       - "${CV_PIPELINE_TRACKER_CONFIG:-/dummy}${CV_PIPELINE_TRACKER_CONFIG:+:/opt/nvidia/via/config/default_tracker_config.yml}"
       - "${ALERT_REVIEW_MEDIA_BASE_DIR:-/dummy}${ALERT_REVIEW_MEDIA_BASE_DIR:+:${ALERT_REVIEW_MEDIA_BASE_DIR:-}}"
+      - xxxxx/xvideo-search-and-summarization/src/vss-engine/src:/opt/nvidia/via/via-engine
 
     environment:
+      NVILA_USE_PYTORCH: "${NVILA_USE_PYTORCH:-true}"
       AZURE_OPENAI_API_KEY: "${AZURE_OPENAI_API_KEY:-}"
       AZURE_OPENAI_ENDPOINT: "${AZURE_OPENAI_ENDPOINT:-}"
       BACKEND_PORT: "${BACKEND_PORT?}"

2.You may encounter insufficient video memory issues on a Jetson Thor. Please use this script to clear system memory first.

cd video-search-and-summarization/deploy/scripts
sudo ./sys_cache_cleaner.sh

3.I strongly recommend you use Cosmos. NVILA is an older VLM model and may not be supported in the future.

AndreiFranca · January 21, 2026, 5:30pm

It works. Thanks!

system · February 4, 2026, 5:31pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
(VSS 2.3.0) Issue with Using vila and nvila Models in VSS Deployment Visual AI Agent nim , llama-31-70b-instruct , llama	6	278	August 14, 2025
求救，运行vllm报错 Jetson Thor camera , generative_ai	4	415	November 17, 2025
Install vllm in Thor failed Jetson Thor generative_ai	6	1113	October 16, 2025
VSS 2.3.0 Docker remote_llm_deployment Failed to generate TRT-LLM engine Visual AI Agent nim , paligemma , kosmos-2 , llama	5	179	May 23, 2025
Issue Running VSS on Jetson Thor Visual AI Agent nim , blueprints , cosmos	4	136	December 27, 2025
Failed to load NVML library (2) DRIVE AGX Thor General driveos-dl	7	75	February 3, 2026
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	615	April 22, 2025
Run vllm fail Jetson Thor generative_ai	2	390	September 11, 2025
JetPack 7.0/Jetson Linux 38.2 for NVIDIA Jetson Thor is now live Jetson Thor cudnn , llama	20	3348	October 27, 2025
Running NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 on the Nvidia Jetson Thor Jetson Thor llm , nemotron	4	198	March 19, 2026

VSS Jetson Thor NVILA deployment libcublas.so.12 import error

Related topics