Local model storage for VSS - LLM and VLM

I’m running VSS 2.2.0 with NVILA model.

I need to store all the models locally for an airgapped, repeatable deployment.

In the values.yaml file for the VSS blueprint, I wanted to ask how to store the LLM locally:

nim-llm:
  image:
    repository: 192.168.51.7:5000/nvcr.io/nim/meta/llama-3.1-70b-instruct
    tag: 1.3.3
  resources:
    limits:
      nvidia.com/gpu: 2
  nodeSelector:
    kubernetes.io/hostname: worker-2
  model:
    name: meta/llama-3.1-70b-instruct
    ngcAPISecret: ngc-api-key-secret
  persistence:
    enabled: true
  hostPath:
    enabled: true
  service:
    name: llm-nim-svc
  llmModel: meta/llama-3.1-70b-instruct

for the VLM I have these volume settings, but I didn’t see this for the LLM


  #### LOCAL MODEL STORE ON WORKER NODE
  - name: local-model-store
    hostPath:
      path: /data/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8


  #### LOCAL MODEL STORE IN POD
  - name: local-model-store
    mountPath: /tmp/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8

I have used the ngc registry command to download both the VLM and the LLM:

ngc registry model download-version "nim/nvidia/vila-1.5-40b:vila-yi-34b-siglip-stage3_1003_video_v8"

ngc registry model download-version "nvidia/nemo/llama-3_1-70b-instruct-nemo:1.0"

I just need to somehow modify the value.yaml file in the NIM LLM portion to be able to call the hostpath for the model I downloaded.

thanks!!
d

You can find the directory related to the model in the LLM pod after the deployment is successful. Then map the directory to the local path.

Thank you for your response, but can you provide me with specific steps that you have used to accomplish this?

What is the directory related to the model in the LLM pod in the VSS deployment?

If I have a “/data” directory with the model on my local host, where do I configure it to be mapped in the VSS deployment?

nim-llm:
  image:
    repository: 192.168.51.11:5000/nvcr.io/nim/meta/llama-3.1-70b-instruct
    tag: 1.3.3
  resources:
    limits:
      nvidia.com/gpu: 2
  nodeSelector:
    kubernetes.io/hostname: worker-2
  model:
    name: meta/llama-3.1-70b-instruct
    ngcAPISecret: ngc-api-key-secret
#  persistence:
#    enabled: true
  hostPath:
    enabled: true
    path: /data/llama-3_1-70b-instruct-nemo_v1.0

  service:
    name: llm-nim-svc
  llmModel: meta/llama-3.1-70b-instruct

#  extraVolumes:
#  #### LOCAL MODEL STORE ON WORKER NODE
#  - name: local-model-store
#    hostPath:
#      path: /data/llama-3_1-70b-instruct-nemo_v1.0
#
#  extraVolumeMounts:
#  #### LOCAL MODEL STORE IN POD
#  - name: local-model-store
#    mountPath: /tmp/llama-3_1-70b-instruct-nemo_v1.0

So your requirement is to map all the models locally, right? We’ll discuss how do we implement this requirement ASAP.

Correct, we need at least to have the large LLM and VLM locally saved for applications working in an air-gapped, on prem environment that cannot have access to public internet after deployment. Many of our customers will require this same feature, thank you for your help! :-)

This will be stored in your container and will work in an air-gapped, on prem environment after the 1st download.

Right but if we redeploy, it will require to download again. That is the problem. If we keep the PVC and don’t delete them, it won’t recognize the previous PVC, so in order to redeploy we have to delete all the previous PVC and start over again with a new download process.

If we can have some configuration for the LLM to host the model locally with hostPath or volumeMount - similar to the VLM, then this would solve the problem when we redeploy.

Is there a method to map a local model folder to the LLM in the VSS-Engine pod?

OK. We will discuss this requirement and consider whether implementing it in a future release. Thanks