Local model storage for VSS - LLM and VLM

david.odell · April 2, 2025, 6:15pm

I’m running VSS 2.2.0 with NVILA model.

I need to store all the models locally for an airgapped, repeatable deployment.

In the values.yaml file for the VSS blueprint, I wanted to ask how to store the LLM locally:

nim-llm:
  image:
    repository: 192.168.51.7:5000/nvcr.io/nim/meta/llama-3.1-70b-instruct
    tag: 1.3.3
  resources:
    limits:
      nvidia.com/gpu: 2
  nodeSelector:
    kubernetes.io/hostname: worker-2
  model:
    name: meta/llama-3.1-70b-instruct
    ngcAPISecret: ngc-api-key-secret
  persistence:
    enabled: true
  hostPath:
    enabled: true
  service:
    name: llm-nim-svc
  llmModel: meta/llama-3.1-70b-instruct

for the VLM I have these volume settings, but I didn’t see this for the LLM


  #### LOCAL MODEL STORE ON WORKER NODE
  - name: local-model-store
    hostPath:
      path: /data/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8


  #### LOCAL MODEL STORE IN POD
  - name: local-model-store
    mountPath: /tmp/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8

I have used the ngc registry command to download both the VLM and the LLM:

ngc registry model download-version "nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"

ngc registry model download-version "nvidia/nemo/llama-3_1-70b-instruct-nemo:1.0"

I just need to somehow modify the value.yaml file in the NIM LLM portion to be able to call the hostpath for the model I downloaded.

thanks!!
d

yuweiw · April 7, 2025, 12:03am

You can find the directory related to the model in the LLM pod after the deployment is successful. Then map the directory to the local path.

david.odell · April 7, 2025, 2:55pm

Thank you for your response, but can you provide me with specific steps that you have used to accomplish this?

What is the directory related to the model in the LLM pod in the VSS deployment?

If I have a “/data” directory with the model on my local host, where do I configure it to be mapped in the VSS deployment?

nim-llm:
  image:
    repository: 192.168.51.11:5000/nvcr.io/nim/meta/llama-3.1-70b-instruct
    tag: 1.3.3
  resources:
    limits:
      nvidia.com/gpu: 2
  nodeSelector:
    kubernetes.io/hostname: worker-2
  model:
    name: meta/llama-3.1-70b-instruct
    ngcAPISecret: ngc-api-key-secret
#  persistence:
#    enabled: true
  hostPath:
    enabled: true
    path: /data/llama-3_1-70b-instruct-nemo_v1.0

  service:
    name: llm-nim-svc
  llmModel: meta/llama-3.1-70b-instruct

#  extraVolumes:
#  #### LOCAL MODEL STORE ON WORKER NODE
#  - name: local-model-store
#    hostPath:
#      path: /data/llama-3_1-70b-instruct-nemo_v1.0
#
#  extraVolumeMounts:
#  #### LOCAL MODEL STORE IN POD
#  - name: local-model-store
#    mountPath: /tmp/llama-3_1-70b-instruct-nemo_v1.0

yuweiw · April 8, 2025, 2:42am

So your requirement is to map all the models locally, right? We’ll discuss how do we implement this requirement ASAP.

david.odell · April 8, 2025, 3:57am

Correct, we need at least to have the large LLM and VLM locally saved for applications working in an air-gapped, on prem environment that cannot have access to public internet after deployment. Many of our customers will require this same feature, thank you for your help! :-)

yuweiw · April 8, 2025, 6:53am

This will be stored in your container and will work in an air-gapped, on prem environment after the 1st download.

david.odell · April 8, 2025, 2:27pm

Right but if we redeploy, it will require to download again. That is the problem. If we keep the PVC and don’t delete them, it won’t recognize the previous PVC, so in order to redeploy we have to delete all the previous PVC and start over again with a new download process.

If we can have some configuration for the LLM to host the model locally with hostPath or volumeMount - similar to the VLM, then this would solve the problem when we redeploy.

Is there a method to map a local model folder to the LLM in the VSS-Engine pod?

yuweiw · April 9, 2025, 2:34am

OK. We will discuss this requirement and consider whether implementing it in a future release. Thanks

Topic		Replies	Views
VSS 2.3.0 Docker remote_llm_deployment Failed to generate TRT-LLM engine Visual AI Agent nim , paligemma , kosmos-2 , llama	4	218	May 23, 2025
Aunch NVIDIA NIM (llama3-8b-instruct) for LLMs locally Access/Accounts nim , llama3-8b-instruct	3	251	November 8, 2024
[HELP] Can I use local model to load LLM and start the Agent studio? Jetson AGX Orin generative_ai , llama	3	329	August 6, 2024
Deploying VSS blueprint using kubernetes Visual AI Agent	3	343	February 14, 2025
401 unauthorized access Visual AI Agent nim , llama-31-70b-instruct , llama	11	469	April 28, 2025
Is there any way to use different VLM models on single VSS instance? Visual AI Agent	2	147	October 20, 2025
Unable to start VSS reasoning properly Visual AI Agent jetson , nim , blueprints , cosmos , nemotron	15	131	April 21, 2026
Saving chunks and their summaries and metadata in VSS Visual AI Agent	8	255	November 4, 2025
How can we bring VLM of choice? Visual AI Agent	1	198	August 23, 2024
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	653	April 22, 2025

Local model storage for VSS - LLM and VLM

Related topics