Getting Error while running blueprint-VSS demo

sagar0 · January 9, 2025, 11:00am

I am currently working with a virtual machine configured with 8xA100 (40GB) GPUs and attempting to run the Blueprint VSS Engine. However, I am encountering several errors during the execution and I have attached the necessary documents and error logs below.

Could you please assist in troubleshooting and resolving the issues related to this configuration?

I would appreciate guidance on the necessary steps that I should take.

Thank you in advance for the help.

vvs-deployment.txt (2.1 KB)
k8_logs.txt (940 Bytes)
describe_vss_logs.txt (4.9 KB)
pods_log.txt (3.0 KB)
get_secrets.txt (527 Bytes)

inception

yuweiw · January 10, 2025, 6:08am

vss-blueprint-0                                        0/1     Running    11 (4m ago)   88m
vss-vss-deployment-5f7959797c-996mq                    0/1     Init:2/3   0             36m

It looks like the above 2 pods can not be run properly because of the insufficient resources.
Since you are using A100(40G), could you try to modify the gpu limits of the resources to 4?

Also you can attach the log by run the command below.

sudo microk8s kubectl logs vss-vss-deployment-POD-NAME

Also what’s the RAM memory size of your system?

sagar0 · January 10, 2025, 6:29am

Thanks for the reply,

in that case I have increased my resources to:

16xA100(40GB)
1.3 TB RAM
96 Core CPU

and I have attached the .yaml and log file, and still I’m facing the same issue.

logs.txt (10.3 KB)
overrides.txt (1.5 KB)

yuweiw · January 10, 2025, 6:35am

This may take a long time to wait since you are using A100(40G). Have you added the --set vss.applicationSpecs.vss-deployment.containers.vss.startupProbe.failureThreshold=360 to the helm install command?

sagar0 · January 10, 2025, 6:39am

vss:
applicationSpecs:
vss-deployment:
containers:
vss:
startupProbe:
failureThreshold: 360

we have tried defining it in the overrides.yaml but still faced the same issue.

@yuweiw

sagar0 · January 10, 2025, 9:30am

Are there any updates on this issue, please?

yuweiw · January 10, 2025, 10:11am

We do not currently have an 8xA100(40G) or 16xA100(40G) device on our hand, so a successful deployment cannot be guaranteed. The minimum memory of a single GPU we can successfully deploy is 48GB(8xL40s).

As you attached the 1.3 TB RAM before, this may be the storage of your device instead of RAM. Could you run top and attach the results? And the VSS requires at least 256+ GB system memory.

Also you can try to modify the limits of the resources to 0, which means there is no limits.

  resources:
    limits:
      nvidia.com/gpu: 0    # no limit

You can also consider the following two deployment methods.

Try to use Remote LLM Endpoint. Steps for it are mentioned here: Link.
Try to use 7b llama model instead of 70b llama model. Steps for it are mentioned here Link

yuweiw · March 5, 2025, 5:41am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · March 19, 2025, 5:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment Visual AI Agent nvbugs	15	97	February 27, 2025
Error deploying VSS blueprint Visual AI Agent nim , llama	3	51	March 10, 2025
VSS Installation problem Visual AI Agent	11	84	February 21, 2025
Error running NVIDIA VSS \|\| pods keep restarting and crashing multiple times Visual AI Agent ubuntu	9	28	April 13, 2025
VSS FAQ Visual AI Agent	6	66	April 11, 2025
VSS Deployment - "vss-blueprint-0" Pod Keeps Crashing NGC GPU Cloud nim , llama-31-70b-instruct , llama , blueprints	0	28	February 2, 2025
VSS issue - API Key Issue When Using OpenAI GPT-4o Instead of LLM-SVC in VSS Blueprint Visual AI Agent nvbugs , kubernetes , ngc , nim , llama-31-70b-instruct , nvidia-technologies , llama , blueprints	7	53	March 4, 2025
Deploying VSS blueprint using kubernetes Visual AI Agent	3	104	February 14, 2025
Error running Nvidia VSS blueprint \|\| pods kept restating and crashing multiple times and never completed Visual AI Agent nim , llama	10	151	March 5, 2025
Getting Error while running blueprint-vss demo Visual AI Agent	30	377	January 24, 2025

Getting Error while running blueprint-VSS demo

Related topics