Connection Refused to ports 8000 and 9234 while running VSS blueprint

shinen · March 11, 2025, 1:56pm

I am trying to run the VSS blueprint on a g6e.24xlarge EC2 instance via the docker deployment.

Since it was reported that there was a bug in downloading the vila-1.5 model from within the containers, it was recommended to either switch to using NVILA or downloading the vila-1.5 model manually and the mount it to the container.

I chose the first option of switching to the NVILA model. While docker-compose runs, it errors out saying it cannot connect to applications that are supposed to be running on ports 8000 and 9432.

I cannot determine whether this is caused by switching the VLM or whether I am doing something wrong.

System Information

Hardware Platform (GPU model and numbers) : g6e.24xlarge AWS EC2 instance ( 96 vCPUs; 768 GiB RAM; 4x NVIDIA L40S GPU )
System Memory : 768 GiB
Ubuntu Version : 24.04
NVIDIA GPU Driver Version 535.183.01
Issue Type : Potential Bug
container logs

Steps to Reproduce

Download the AI Blueprint repository
cd docker/remote_llm_deployment
Edit .env with the appropriate values
- NGC_API_KEY
- NVIDIA_API_KEY
Change the following values in .env
- VLM_MODEL_TO_USE=nvila
- MODEL_PATH=git:https://huggingface.co/Efficient-Large-Model/NVILA-15B
docker compose up

yuweiw · March 12, 2025, 6:58am

Hi @shinen , we will analyze this as soon as possible. And where did you get this deployment reference method?

shinen · March 12, 2025, 1:19pm

The aim was to get the blueprint running within our own infrastructure. So, as we explored the AI blueprint repository, we thought the docker-compose-based deployment could work well for us. But it is not as straight-forward as we thought it would be.

yuweiw · March 13, 2025, 1:12am

OK. Could you check your security groups by referring to the #9 first?

shinen · March 13, 2025, 8:45am

Does that post not refer to authentication to NIMs? Is that relevant to this issue?

I do not think Security Groups should be a problem because the whole setup is running within docker and a dedicated docker network. There were no containers exposing port 8000. I am not fully sure about port 9234. I would have to get back into the instance and check whether there are any containers exposing port 9234.

I will also expose ports 8000 and 9234 on the Security Groups and see if that resolves the issue.

I will report my observations of this experiment within a few hours.

shinen · March 13, 2025, 5:15pm

I followed the Deploy_VSS_docker_Crusoe.ipynb to figure out which step I was missing.

And the step I was missing was described in the notebook ( right at the top of the Deployment step ) :

We will be using Cosmos Nemotron VLM, which is part of the main container. All other models need to be set up before proceeding with the blueprint container. These include:

Embedding NIM

Reranker NIM

LLM NIM

Now, in the launchable, I had to add one NGC_API_KEY at the top of the notebook and was hands-off for the rest of the notebook ( apart from executing the notebook cells ). That makes me assume that the NIMs should be downloadable using the same NGC_API_KEY. However, when I tried to download the NIMs ( docker pull ) using the same NGC_API_KEY that I used to download the blueprint container, it failed to download all 3 containers with the error :

Error response from daemon: pull access denied for nvcr.io/nim/<nim-image-uri>, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

So, now I understand how the link to the other post that checked the subscriptions. My Enterprise account lists only the VSS EA and not the Developer Program. Is that the reason why I am unable to download the NIMs?
However, that does not explain how the launchable did not have the same problem.

For clarity, my initial problem started with the blueprint unable to connect to applications that were expected on ports 8000 and 9234.

The application that is supposed to be running on port 8000 is the LLM NIM and the application that is supposed to running on port 9234 is the embedding NIM.

Now, the problem is that I cannot figure out how to download the LLM, Embedding and Re-ranker NIMs using the same NGC_API_KEY.

yuweiw · March 14, 2025, 2:12am

Yes. That might be the reason. Also you need to keep an eye on whether your VSS subscription has expired. Because from your initial log, there was a problem downloading the model, your initial problem may be caused by not having permission too.

[03/07/2025-15:21:43] [TRT-LLM] [E] Failed to load tokenizer from /root/.via/ngc_model_cache/NVILA-15B

shinen · March 14, 2025, 4:58am

Could you please direct me to getting the Developer Program listed in the NGC subscriptions?

Like the user in Build a Video Search and Summarization Agent - #10 by rafael54 , my developer.nvidia.com account lists the Developer membership, but its not reflected in org.ngc.nvidia.com.

My subscription is valid until 2025-05-31.

That was due to my NGC_API_KEY being from the wrong account. I was attempting to use the API key from my personal account rather than the Enterprise account. Resolving that allowed me to download the blueprint containers.

I assumed that this was because I switched to the NVILA LLM rather than the vila-1.5 LLM.

yuweiw · March 14, 2025, 8:27am

We will confirm this process ASAP.

Topic		Replies	Views
Getting Started With NVIDIA NIM Tutorial Issues with NGC Registry Access/Accounts ubuntu , nim , llm , llama3-8b-instruct	7	1298	July 24, 2024
Error while downloading VIA Visual AI Agent llama	20	276	September 23, 2024
Warning Unhealthy kubelet Startup probe failed: Get "v1/health/ready": dial tcp 10.1.124.81:8000: connect: connection refused Visual AI Agent nvbugs , nim , llama	31	138	April 14, 2025
NIM embedding model downloads but fails with auth error on startup Access/Accounts nim , nv-embedqa-e5-v5	29	610	April 10, 2025
Model says there is a compatible profile but fails on data type Models nim , mistral-7b-instruct-v03	4	622	August 21, 2024
Build a Video Search and Summarization Agent Visual AI Agent nim , llama	9	137	March 11, 2025
401 unauthorized access Visual AI Agent nim , llama-31-70b-instruct , llama	11	33	April 28, 2025
Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment Visual AI Agent nvbugs	15	103	February 27, 2025
VIA Summarization Workflow ERROR Visual AI Agent llama	34	325	March 5, 2025
Riva 2.16 quick start error - riva_init.sh - invalid API key Riva ubuntu , nim	5	158	August 7, 2024

Connection Refused to ports 8000 and 9234 while running VSS blueprint

System Information

Steps to Reproduce

Related topics