VSS local deployment single gpu: Failed to load VIA stream handler - Guardrails / CA-RAG setup failed

O13K511 · July 3, 2025, 2:08pm

Please provide the following information when creating a topic:

Hardware Platform (GPU model and numbers) H100
System Memory 320
Ubuntu Version 22.04
NVIDIA GPU Driver Version (valid for GPU only) 570.124.06
Issue Type bug
How to reproduce the issue ? docker compose up

Hi! Have issue starting VSS pipeline using local deployment on single GPU.
VSS pipeline doesn’t start due to errors in Guardrails and CA RAG.

Guardrails

via-server-1  |  Ctrl+C to exit ...
via-server-1  | Using nvila
via-server-1  | Starting VIA server in release mode
via-server-1  | 2025-07-03 14:04:49,612 INFO Initializing VIA Stream Handler
via-server-1  | INFO:     Started server process [230]
via-server-1  | INFO:     Waiting for application startup.
via-server-1  | INFO:     Application startup complete.
via-server-1  | INFO:     Uvicorn running on http://127.0.0.1:60000 (Press CTRL+C to quit)
via-server-1  | Exception in thread Thread-3 (run):
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 174, in _new_conn
via-server-1  |     conn = connection.create_connection(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 95, in create_connection
via-server-1  |     raise err
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 85, in create_connection
via-server-1  |     sock.connect(sa)
via-server-1  | ConnectionRefusedError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 715, in urlopen
via-server-1  |     httplib_response = self._make_request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
via-server-1  |     conn.request(method, url, **httplib_request_kw)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 244, in request
via-server-1  |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1283, in request
via-server-1  |     self._send_request(method, url, body, headers, encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
via-server-1  |     self.endheaders(body, encode_chunked=encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
via-server-1  |     self._send_output(message_body, encode_chunked=encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
via-server-1  |     self.send(msg)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 976, in send
via-server-1  |     self.connect()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 205, in connect
via-server-1  |     conn = self._new_conn()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 186, in _new_conn
via-server-1  |     raise NewConnectionError(
via-server-1  | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7229577a6290>: Failed to establish a new connection: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 667, in send
via-server-1  |     resp = conn.urlopen(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 801, in urlopen
via-server-1  |     retries = retries.increment(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 594, in increment
via-server-1  |     raise MaxRetryError(_pool, url, error or ResponseError(cause))
via-server-1  | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=8006): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.connect
ion.HTTPConnection object at 0x7229577a6290>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
via-server-1  |     self.run()
via-server-1  |   File "/usr/lib/python3.10/threading.py", line 953, in run
via-server-1  |     self._target(*self._args, **self._kwargs)
via-server-1  |   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
via-server-1  |     return loop.run_until_complete(main)
via-server-1  |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
via-server-1  |     return future.result()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/generation.py", line 134, in init
via-server-1  |     await asyncio.gather(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/generation.py", line 261, in _init_bot_message_index
via-server-1  |     await self.bot_message_index.add_items(items)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/embeddings/basic.py", line 185, in add_items
via-server-1  |     await self._get_embeddings([item.text for item in items])
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/embeddings/cache.py", line 307, in wrapper_decorator
via-server-1  |     return await func(self, texts)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/embeddings/basic.py", line 156, in _get_embeddings
via-server-1  |     embeddings = await self._model.encode_async(texts)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/embeddings/providers/nim.py", line 59, in encode_async
via-server-1  |     result = await self.document_embedder.aembed_documents(documents)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/embeddings/embeddings.py", line 67, in aembed_documents
via-server-1  |     return await run_in_executor(None, self.embed_documents, texts)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/config.py", line 588, in run_in_executor
via-server-1  |     return await asyncio.get_running_loop().run_in_executor(
via-server-1  |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
via-server-1  |     result = self.fn(*self.args, **self.kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/config.py", line 579, in wrapper
via-server-1  |     return func(*args, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 163, in embed_documents
via-server-1  |     all_embeddings.extend(self._embed(batch, model_type="passage"))
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/embeddings.py", line 137, in _embed
via-server-1  |     response = self._client.get_req(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/_common.py", line 473, in get_req
via-server-1  |     response, session = self._post(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/_common.py", line 366, in _post
via-server-1  |     self.last_response = response = session.post(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 637, in post
via-server-1  |     return self.request("POST", url, data=data, json=json, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request
via-server-1  |     resp = self.send(prep, **send_kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send
via-server-1  |     r = adapter.send(request, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 700, in send
via-server-1  |     raise ConnectionError(e, request=request)
via-server-1  | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8006): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.conn
ection.HTTPConnection object at 0x7229577a6290>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 2025-07-03 14:04:50,241 ERROR Error in guardrails: LLM Call Exception: HTTPConnectionPool(host='0.0.0.0', port=8007): Max retries exceeded with url: /v1/chat/completions (Cau
sed by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 2025-07-03 14:04:50,241 INFO Stopping VIA Stream Handler
via-server-1  | 2025-07-03 14:04:50,241 INFO Stopped VIA Stream Handler
via-server-1  | 2025-07-03 14:04:50,241 ERROR Failed to load VIA stream handler - Guardrails failed
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 174, in _new_conn
via-server-1  |     conn = connection.create_connection(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 95, in create_connection
via-server-1  |     raise err
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 85, in create_connection
via-server-1  |     sock.connect(sa)
via-server-1  | ConnectionRefusedError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 715, in urlopen
via-server-1  |     httplib_response = self._make_request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
via-server-1  |     conn.request(method, url, **httplib_request_kw)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 244, in request
via-server-1  |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1283, in request
via-server-1  |     self._send_request(method, url, body, headers, encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
via-server-1  |     self.endheaders(body, encode_chunked=encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
via-server-1  |     self._send_output(message_body, encode_chunked=encode_chunked)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
via-server-1  |     self.send(msg)
via-server-1  |   File "/usr/lib/python3.10/http/client.py", line 976, in send
via-server-1  |     self.connect()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 205, in connect
via-server-1  |     conn = self._new_conn()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 186, in _new_conn
via-server-1  |     raise NewConnectionError(
via-server-1  | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 667, in send
via-server-1  |     resp = conn.urlopen(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 801, in urlopen
via-server-1  |     retries = retries.increment(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 594, in increment
via-server-1  |     raise MaxRetryError(_pool, url, error or ResponseError(cause))
via-server-1  | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=8007): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.c
onnection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/utils.py", line 92, in llm_call
via-server-1  |     result = await llm.agenerate_prompt(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 796, in agenerate_prompt
via-server-1  |     return await self.agenerate(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 756, in agenerate
via-server-1  |     raise exceptions[0]
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 924, in _agenerate_with_cache
via-server-1  |     result = await self._agenerate(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 964, in _agenerate
via-server-1  |     return await run_in_executor(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/config.py", line 588, in run_in_executor
via-server-1  |     return await asyncio.get_running_loop().run_in_executor(
via-server-1  |   File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
via-server-1  |     result = self.fn(*self.args, **self.kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/config.py", line 579, in wrapper
via-server-1  |     return func(*args, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/llm/providers/_langchain_nvidia_ai_endpoints_patch.py", line 45, in wrapper
via-server-1  |     return generate_from_stream(stream_iter)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 88, in generate_from_stream
via-server-1  |     generation = next(stream, None)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/chat_models.py", line 420, in _stream
via-server-1  |     for response in self._client.get_req_stream(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_nvidia_ai_endpoints/_common.py", line 560, in get_req_stream
via-server-1  |     response = self.get_session_fn().post(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 637, in post
via-server-1  |     return self.request("POST", url, data=data, json=json, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 589, in request
via-server-1  |     resp = self.send(prep, **send_kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 703, in send
via-server-1  |     r = adapter.send(request, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 700, in send
via-server-1  |     raise ConnectionError(e, request=request)
via-server-1  | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8007): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib
3.connection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 507, in _create_llm_rails_pool
via-server-1  |     response = self._LLMRailsPool[0].generate(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/rails/llm/llmrails.py", line 990, in generate
via-server-1  |     return loop.run_until_complete(
via-server-1  |   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
via-server-1  |     return future.result()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/rails/llm/llmrails.py", line 703, in generate_async
via-server-1  |     new_events = await self.runtime.generate_events(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 167, in generate_events
via-server-1  |     next_events = await self._process_start_action(events)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/colang/v1_0/runtime/runtime.py", line 363, in _process_start_action
via-server-1  |     result, status = await self.action_dispatcher.execute_action(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 253, in execute_action
via-server-1  |     raise e
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/action_dispatcher.py", line 214, in execute_action
via-server-1  |     result = await result
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/library/self_check/input_check/actions.py", line 72, in self_check_input
via-server-1  |     response = await llm_call(llm, prompt, stop=stop)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/nemoguardrails/actions/llm/utils.py", line 96, in llm_call
via-server-1  |     raise LLMCallException(e)
via-server-1  | nemoguardrails.actions.llm.utils.LLMCallException: LLM Call Exception: HTTPConnectionPool(host='0.0.0.0', port=8007): Max retries exceeded with url: /v1/chat/completions (Cau
sed by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1368, in run
via-server-1  |     self._stream_handler = ViaStreamHandler(self._args)
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 409, in __init__
via-server-1  |     self._create_llm_rails_pool()
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 513, in _create_llm_rails_pool
via-server-1  |     raise Exception("Guardrails failed")
via-server-1  | Exception: Guardrails failed
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 2880, in <module>
via-server-1  |     server.run()
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1370, in run
via-server-1  |     raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")
via-server-1  | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - Guardrails failed

CA-RAG

ia-server-1  | 2025-07-03 13:54:20,603 INFO Warming up LLM meta/llama-3.1-8b-instruct
via-server-1  | 2025-07-03 13:54:23,345 ERROR Error warming up LLM meta/llama-3.1-8b-instruct: Connection error.
via-server-1  | 2025-07-03 13:54:23,345 ERROR Error warming up LLM: Connection error.
via-server-1  | 2025-07-03 13:54:23,345 ERROR Exception Connection error.
via-server-1  | 2025-07-03 13:54:23,348 ERROR Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 100, in connect_tcp
via-server-1  |     sock = socket.create_connection(
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 845, in create_connection
via-server-1  |     raise err
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 833, in create_connection
via-server-1  |     sock.connect(sa)
via-server-1  | ConnectionRefusedError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 218, in handle_request
via-server-1  |     resp = self._pool.handle_request(req)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 262, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 245, in handle_request
via-server-1  |     response = connection.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 92, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 69, in handle_request
via-server-1  |     stream = self._connect(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 117, in _connect
via-server-1  |     stream = self._network_backend.connect_tcp(**kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 99, in connect_tcp
via-server-1  |     with map_exceptions(exc_map):
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
via-server-1  |     raise to_exc(exc) from exc
via-server-1  | httpcore.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 972, in _request
via-server-1  |     response = self._client.send(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 901, in send
via-server-1  |     response = self._send_handling_auth(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 929, in _send_handling_auth
via-server-1  |     response = self._send_handling_redirects(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 966, in _send_handling_redirects
via-server-1  |     response = self._send_single_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1002, in _send_single_request
via-server-1  |     response = transport.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 217, in handle_request
via-server-1  |     with map_httpcore_exceptions():
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
via-server-1  |     raise mapped_exc(message) from exc
via-server-1  | httpx.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 100, in connect_tcp
via-server-1  |     sock = socket.create_connection(
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 845, in create_connection
via-server-1  |     raise err
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 833, in create_connection
via-server-1  |     sock.connect(sa)
via-server-1  | ConnectionRefusedError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 218, in handle_request
via-server-1  |     resp = self._pool.handle_request(req)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 262, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 245, in handle_request
via-server-1  |     response = connection.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 92, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 69, in handle_request
via-server-1  |     stream = self._connect(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 117, in _connect
via-server-1  |     stream = self._network_backend.connect_tcp(**kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 99, in connect_tcp
via-server-1  |     with map_exceptions(exc_map):
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
via-server-1  |     raise to_exc(exc) from exc
via-server-1  | httpcore.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 972, in _request
via-server-1  |     response = self._client.send(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 901, in send
via-server-1  |     response = self._send_handling_auth(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 929, in _send_handling_auth
via-server-1  |     response = self._send_handling_redirects(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 966, in _send_handling_redirects
via-server-1  |     response = self._send_single_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1002, in _send_single_request
via-server-1  |     response = transport.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 217, in handle_request
via-server-1  |     with map_httpcore_exceptions():
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
via-server-1  |     raise mapped_exc(message) from exc
via-server-1  | httpx.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 10, in map_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 100, in connect_tcp
via-server-1  |     sock = socket.create_connection(
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 845, in create_connection
via-server-1  |     raise err
via-server-1  |   File "/usr/lib/python3.10/socket.py", line 833, in create_connection
via-server-1  |     sock.connect(sa)
via-server-1  | ConnectionRefusedError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
via-server-1  |     yield
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 218, in handle_request
via-server-1  |     resp = self._pool.handle_request(req)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 262, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection_pool.py", line 245, in handle_request
via-server-1  |     response = connection.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 92, in handle_request
via-server-1  |     raise exc
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 69, in handle_request
via-server-1  |     stream = self._connect(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_sync/connection.py", line 117, in _connect
via-server-1  |     stream = self._network_backend.connect_tcp(**kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_backends/sync.py", line 99, in connect_tcp
via-server-1  |     with map_exceptions(exc_map):
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
via-server-1  |     raise to_exc(exc) from exc
via-server-1  | httpcore.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 972, in _request
via-server-1  |     response = self._client.send(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 901, in send
via-server-1  |     response = self._send_handling_auth(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 929, in _send_handling_auth
via-server-1  |     response = self._send_handling_redirects(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 966, in _send_handling_redirects
via-server-1  |     response = self._send_single_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 1002, in _send_single_request
via-server-1  |     response = transport.handle_request(request)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 217, in handle_request
via-server-1  |     with map_httpcore_exceptions():
via-server-1  |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
via-server-1  |     self.gen.throw(typ, value, traceback)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
via-server-1  |     raise mapped_exc(message) from exc
via-server-1  | httpx.ConnectError: [Errno 111] Connection refused
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager.py", line 119, in run
via-server-1  |     self._initialize()
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager.py", line 93, in _initialize
via-server-1  |     self.cm_handler = ContextManagerHandler(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager_handler.py", line 103, in __init__
via-server-1  |     self.configure_init(config, req_info)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager_handler.py", line 175, in configure_init
via-server-1  |     notification_llm = ChatOpenAITool(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/tools/llm/llm_handler.py", line 109, in __init__
via-server-1  |     self.warmup(model)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/tools/llm/llm_handler.py", line 117, in warmup
via-server-1  |     logger.info(str(self.invoke("Hello, world!")))
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/tools/llm/llm_handler.py", line 45, in invoke
via-server-1  |     return self.llm.invoke(*args, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/runnables/configurable.py", line 133, in invoke
via-server-1  |     return runnable.invoke(input, config, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 286, in invoke
via-server-1  |     self.generate_prompt(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 786, in generate_prompt
via-server-1  |     return self.generate(prompt_messages, stop=stop, callbacks=callbacks, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 643, in generate
via-server-1  |     raise e
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 633, in generate
via-server-1  |     self._generate_with_cache(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_core/language_models/chat_models.py", line 851, in _generate_with_cache
via-server-1  |     result = self._generate(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/langchain_openai/chat_models/base.py", line 683, in _generate
via-server-1  |     response = self.client.create(**payload)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_utils/_utils.py", line 274, in wrapper
via-server-1  |     return func(*args, **kwargs)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 668, in create
via-server-1  |     return self._post(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1259, in post
via-server-1  |     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 936, in request
via-server-1  |     return self._request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 996, in _request
via-server-1  |     return self._retry_request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1074, in _retry_request
via-server-1  |     return self._request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 996, in _request
via-server-1  |     return self._retry_request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1074, in _retry_request
via-server-1  |     return self._request(
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1006, in _request
via-server-1  |     raise APIConnectionError(request=request) from err
via-server-1  | openai.APIConnectionError: Connection error.
via-server-1  | 
via-server-1  | 2025-07-03 13:54:24,578 ERROR Error initializing Context Manager: Failed to load Context Manager Process no.: 0
via-server-1  | 2025-07-03 13:54:24,578 INFO Stopping VIA Stream Handler
via-server-1  | 2025-07-03 13:54:24,578 INFO Stopping VLM pipeline
via-server-1  | 2025-07-03 13:54:25,578 INFO Stopped VLM pipeline
via-server-1  | 2025-07-03 13:54:25,578 INFO Stopped VIA Stream Handler
via-server-1  | 2025-07-03 13:54:25,579 ERROR Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 480, in __init__
via-server-1  |     self._create_ctx_mgr_pool(config)
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 536, in _create_ctx_mgr_pool
via-server-1  |     ContextManager(config=config, process_index=self.num_ctx_mgr)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager.py", line 278, in __init__
via-server-1  |     raise Exception(
via-server-1  | Exception: Failed to load Context Manager Process no.: 0
via-server-1  | 
via-server-1  | 2025-07-03 13:54:25,579 ERROR Failed to load VIA stream handler - CA-RAG setup failed.
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 480, in __init__
via-server-1  |     self._create_ctx_mgr_pool(config)
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 536, in _create_ctx_mgr_pool
via-server-1  |     ContextManager(config=config, process_index=self.num_ctx_mgr)
via-server-1  |   File "/usr/local/lib/python3.10/dist-packages/vss_ctx_rag/context_manager/context_manager.py", line 278, in __init__
via-server-1  |     raise Exception(
via-server-1  | Exception: Failed to load Context Manager Process no.: 0
via-server-1  | 
via-server-1  | The above exception was the direct cause of the following exception:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1368, in run
via-server-1  |     self._stream_handler = ViaStreamHandler(self._args)
via-server-1  |   File "/opt/nvidia/via/via-engine/via_stream_handler.py", line 485, in __init__
via-server-1  |     raise (ValueError("CA-RAG setup failed.")) from e
via-server-1  | ValueError: CA-RAG setup failed.
via-server-1  | 
via-server-1  | During handling of the above exception, another exception occurred:
via-server-1  | 
via-server-1  | Traceback (most recent call last):
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 2880, in <module>
via-server-1  |     server.run()
via-server-1  |   File "/opt/nvidia/via/via-engine/via_server.py", line 1370, in run
via-server-1  |     raise ViaException(f"Failed to load VIA stream handler - {str(ex)}")
via-server-1  | via_exception.ViaException: ViaException - code: InternalServerError message: Failed to load VIA stream handler - CA-RAG setup failed.
via-server-1  | Killed process with PID 83

LLM (meta/llama-3.1-8b-instruct) is accessible and working via shell using POST request, base_url in config.yaml is updated as required, docker commands without sudo.

With disabled Guardrails and CA-RAG VSS pipeline starts.

Thanks in advance!

yuweiw · July 4, 2025, 3:06am

O13K511:

ia-server-1  | 2025-07-03 13:54:20,603 INFO Warming up LLM meta/llama-3.1-8b-instruct
via-server-1  | 2025-07-03 13:54:23,345 ERROR Error warming up LLM meta/llama-3.1-8b-instruct: Connection error.
via-server-1  | 2025-07-03 13:54:23,345 ERROR Error warming up LLM: Connection error.
via-server-1  | 2025-07-03 13:54:23,345 ERROR Exception Connection error.
via-server-1  | 2025-07-03 13:54:23,348 ERROR Traceback (most recent call last):

1.Since you have confirmed that the LLM deployment was successful, but the log shows that the LLM can’t be connected, you can first check if there is any port being occupied by referring to our FAQ below.
https://forums.developer.nvidia.com/t/vss-faq/328730/12
2. Did you deploy that exactly according to our instructions fully-local-deployment-single-gpu?

O13K511 · July 4, 2025, 9:02am

Hi! Thanks for reply

I will check ports according to FAQ
Yes, I follow instruction from link and via curl verified that all three items are accessible according to guide:

LLM endpoint at port 8007 for VSS
embedding endpoint at port 8006 for VSS
embedding endpoint at port 8005 for VSS

and from error stack trace all ports match

via-server-1  | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=8006): Max retries exceeded with url: /v1/embeddings (Caused by NewConnectionError('<urllib3.conn
ection.HTTPConnection object at 0x7229577a6290>: Failed to establish a new connection: [Errno 111] Connection refused'))
via-server-1  | 2025-07-03 14:04:50,241 ERROR Error in guardrails: LLM Call Exception: HTTPConnectionPool(host='0.0.0.0', port=8007): Max retries exceeded with url: /v1/chat/completions (Cau
sed by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7229577a6fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))

O13K511 · July 4, 2025, 3:06pm

Ok, figured out this.

For Ubuntu OS need to add –add-host=host.docker.internal:host-gateway to docker run command when starting LLM/Embedding NIM/Reranker NIM/

system · July 18, 2025, 3:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Issue Running VSS on Jetson Thor Visual AI Agent nim , blueprints , cosmos	4	71	December 27, 2025
VSS 2.3.0 Docker remote_llm_deployment Failed to generate TRT-LLM engine Visual AI Agent nim , paligemma , kosmos-2 , llama	5	163	May 23, 2025
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	591	April 22, 2025
VSS FAQ Visual AI Agent	10	627	January 29, 2026
401 unauthorized access Visual AI Agent nim , llama-31-70b-instruct , llama	12	322	April 28, 2025
VSS 2.3.0 Docker remote_llm_deployment GUARDRAILS [429] Too Many Requests Visual AI Agent nim	5	145	May 23, 2025
VIA microservices not working any longer Visual AI Agent nim	16	262	November 7, 2025
VSS Installation problem Visual AI Agent	11	300	February 21, 2025
VSS blueprint 2.2.0 - processing, percentage complete is 0.00 forever Visual AI Agent	8	240	March 6, 2025
(VSS 2.3.0) Issue with Using vila and nvila Models in VSS Deployment Visual AI Agent nim , llama-31-70b-instruct , llama	6	250	August 14, 2025

VSS local deployment single gpu: Failed to load VIA stream handler - Guardrails / CA-RAG setup failed

Related topics