Getting Error while running blueprint-vss demo

It runs on 8xL40S since this 3rd configuration was added recently.
That instance is in launchpad so strangely ‘sudo microk8s’ has a lot of issues but running straight with helm and kubectl worked fine.

Also on the 4xH100, when we took out the LLM everything is “running”. Just need to get authorization figured out between local and llama-3.1-70b-instruct Model by Meta | NVIDIA NIM via https://integrate.api.nvidia.com/v1 I believe. Seeing two kinds of errors of connection refused and unauthorized in different trials.

Because the 8xL40S can run normally, it can be confirmed that the problem before is caused by the insufficient resources.

The LLM occupies about half of all the resources, so that everything is OK after taking out that. Could you try to configure-the-nims to setup some keys for the model to verify the connection refused and unauthorized issue?

But I’m using the same exact NIM nvcr.io/nim/meta/llama-3.1-70b-instruct so not clear to me what I should change according to Configure the NIMs — Video Search and Summarization Agent?

I tried to set NVIDIA_API_KEY but maybe I need to set OPENAI_API_KEY also since it’s using openai-compat.

@kelvin.lwin When you are trying to run on 4xH100, what NIMs are allocated to which GPUS?

Hi @kelvin.lwin , could you try to use the configure file and command below to deploy the VSS on 4xH100?
override.yaml (1.4 KB)
Command:

sudo microk8s helm upgrade --install vss-blueprint nvidia-blueprint-vss-2.1.0.tgz --set global.ngcImagePullSecretName=ngc-docker-reg-secret -f override.yaml

Yes, @yuweiw this is working now with forced allocation. Thanks for the help!

@aryason I assume you wanted to know to provide override.yaml. But here’s the the current allocation

2 Likes

Glad to hear that. If you have any other issues, feel free to file a new topic.

Thanks for the help :)

We have not deployed that on the 8xA100(40G) GPUs at this time. But you can try to deploy on your side and file a new topic when you encounter problems.
Please refer to the note information below.
When running on an L40 / L40S system, the default startup probe timeout might…

Sure. Because the forum you selected is not correct, we did not track it in time. You can choose the visual-ai-agent forum to file the VSS related topic later.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.