Getting Error while running blueprint-vss demo

kelvin.lwin · January 1, 2025, 1:18am

It runs on 8xL40S since this 3rd configuration was added recently.
That instance is in launchpad so strangely ‘sudo microk8s’ has a lot of issues but running straight with helm and kubectl worked fine.

Also on the 4xH100, when we took out the LLM everything is “running”. Just need to get authorization figured out between local and llama-3.1-70b-instruct Model by Meta | NVIDIA NIM via https://integrate.api.nvidia.com/v1 I believe. Seeing two kinds of errors of connection refused and unauthorized in different trials.

yuweiw · January 2, 2025, 1:59am

Because the 8xL40S can run normally, it can be confirmed that the problem before is caused by the insufficient resources.

The LLM occupies about half of all the resources, so that everything is OK after taking out that. Could you try to configure-the-nims to setup some keys for the model to verify the connection refused and unauthorized issue?

kelvin.lwin · January 2, 2025, 2:11am

But I’m using the same exact NIM nvcr.io/nim/meta/llama-3.1-70b-instruct so not clear to me what I should change according to Configure the NIMs — Video Search and Summarization Agent?

I tried to set NVIDIA_API_KEY but maybe I need to set OPENAI_API_KEY also since it’s using openai-compat.

aryason · January 2, 2025, 2:52am

@kelvin.lwin When you are trying to run on 4xH100, what NIMs are allocated to which GPUS?

yuweiw · January 2, 2025, 7:23am

Hi @kelvin.lwin , could you try to use the configure file and command below to deploy the VSS on 4xH100?
override.yaml (1.4 KB)
Command:

sudo microk8s helm upgrade --install vss-blueprint nvidia-blueprint-vss-2.1.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f override.yaml

kelvin.lwin · January 2, 2025, 4:57pm

Yes, @yuweiw this is working now with forced allocation. Thanks for the help!

@aryason I assume you wanted to know to provide override.yaml. But here’s the the current allocation

yuweiw · January 3, 2025, 3:01am

Glad to hear that. If you have any other issues, feel free to file a new topic.

basha.ghouse · January 3, 2025, 4:27am

Thanks for the help :)

yuweiw · January 10, 2025, 2:06am

We have not deployed that on the 8xA100(40G) GPUs at this time. But you can try to deploy on your side and file a new topic when you encounter problems.
Please refer to the note information below.
When running on an L40 / L40S system, the default startup probe timeout might…

yuweiw · January 10, 2025, 6:19am

Sure. Because the forum you selected is not correct, we did not track it in time. You can choose the visual-ai-agent forum to file the VSS related topic later.

system · January 24, 2025, 6:20am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deployment of Nvidia VSS Blueprint - vss-vss-deployment POD is failing to initialize Visual AI Agent nim , llama-31-70b-instruct , llama , blueprints	1	149	February 14, 2025
VSS Blueprint Helm Installation- Nemo embedding pod failure Visual AI Agent nim , llama	30	332	May 29, 2025
VSS Installation Visual AI Agent	14	381	February 14, 2025
While making setup for Video search and summarization.there are certain dependence that are not resolved NVIDIA AI Workbench nvbugs	1	57	March 31, 2025
VSS Blueprint Visual AI Agent blueprints	5	165	August 22, 2025
Unable to obtain VSS's Helm Charts Visual AI Agent	5	164	March 27, 2025
Warning Unhealthy kubelet Startup probe failed: Get "v1/health/ready": dial tcp 10.1.124.81:8000: connect: connection refused Visual AI Agent nvbugs , nim , llama	31	736	April 14, 2025
VSS Installation problem Visual AI Agent	11	288	February 21, 2025
Error deploying VSS blueprint Visual AI Agent nim , llama	3	150	March 10, 2025
VSS issue - vss-blueprint-0 keeps restarting Visual AI Agent nvbugs	4	172	February 13, 2025

Getting Error while running blueprint-vss demo

Related topics