Having problem when deploying jarvis ASR service on AWS EKS

Having problem when deploying Jarvis ASR service on AWS EKS

I am been trying to deploy by Jarvis ASR service on AWS EKS by following the docs Deploying Jarvis ASR Service on AWS EKS — NVIDIA Jarvis Speech Skills v1.3.0-beta documentation.

The first thing that I noticed is that the doc is outdated…

Also, have a couple of question

  1. In the section downloading and modifying the Jarvis API Helm Chart, point 2, it says add something to the file templates/deployment.yaml, but what is that something is not mentioned.
    temp

  2. In the section defining and launching the EKS Cluster, point 4, when I executing

    helm install --namespace jarvis jarvis .

    its giving an error:

    Error: template: jarvis-api/templates/modeldeploykey.yaml:1:12: executing "jarvis-api/templates/modeldeploykey.yaml" at <len .Values.modelRepoGenerator.modelDeployKey>: error calling len: len of untyped nil

Can anyone help me with this?

Hi @bjoish
Thanks for bringing the doc issue to our attention.

Could you please try the below latest API helm chart and let us know in case issue persist:
helm fetch https://helm.ngc.nvidia.com/nvidia/jarvis/charts/jarvis-api-1.1.0-beta.tgz --username='$oauthtoken' --password=<YOUR API KEY>

Thanks

Hi Sunil and NVIDIA team,

I tried above what you mentioned:

and then I tried installing the helm chart:

$ helm install --namespace jarvis jarvis .

Error: template: jarvis-api/templates/modeldeploykey.yaml:1:12: executing “jarvis-api/templates/modeldeploykey.yaml” at <len .Values.modelRepoGenerator.modelDeployKey>: error calling len: len of nil pointer

and I am getting the same error as above

Also following this guide: Deploying Jarvis ASR Service on AWS EKS — NVIDIA Jarvis Speech Skills v1.2.1-beta documentation

$ helm fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2.1-ea.tgz --username=’$oauthtoken’ --password=$NGC_API_KEY

Error: failed to fetch https://helm.ngc.nvidia.com/ea-2-jarvis/charts/jarvis-api-0.2.1-ea.tgz : 401 Unauthorized

I am using EKS and need streamline EKS instructions

Hi @yuvraj1

Could you please re-validate the NGC_API_KEY, just to eliminate 401 error due any API key mismatch issue?

Thanks

Hi @yuvraj1
Could you please try below command:

helm fetch https://helm.ngc.nvidia.com/nvidia/jarvis/charts/jarvis-api-1.2.1-beta.tgz --username='$oauthtoken' --password=<YOUR API KEY
helm install jarvis-api --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` --set ngcCredentials.email=your_email@your_domain.com --set modelRepoGenerator.modelDeployKey=`echo -n tlt_encode | base64 -w0`

Thanks

Hi @SunilJB -

I used the second command and its invalid:

base64: invalid option – w

Please refer to below link in case it helps:

Thanks

Hi @SunilJB ,

thanks- I had to modify my command abit

  1. now I am seeing jarvis api pod pending

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
jarvis-api-1624949153-78c66db9c8-ck6hb 0/1 Pending 0 94s

Describe the pod it says:

Events:
Type Reason Age From Message


Warning FailedScheduling 35s (x4 over 100s) default-scheduler 0/3 nodes are available: 3 Insufficient nvidia.com/gpu.

How many NVIDIA GPUs and nodes do we need

  1. can you point me to the correct NVIDIA NGC instructions link for Jarvis installation you mentioned?

@SunilJB - I am having issues pulling this jarvis client container image:

Error response from daemon: pull access denied for nvcr.io/nvidia/jarvis-speech-client, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied


what is the issue and why? I am logged into

$ docker login nvcr.io

Authenticating with existing credentials…

Login did not succeed, error: Error response from daemon: Get https://nvcr.io/v2/: unauthorized: authentication required

Username ($oauthtoken):

Password:

Login Succeeded

Hi @yuvraj1
You need to have 3 nodes in your cluster and those nodes need to have gpu.
Also, make sure you have the gpu operator plugin installed

Thanks

Do you need to add something like this to the spec? @SunilJB

managedNodeGroups:
  - name: gpu-linux-workers
    instanceType: p3.2xlarge

Or is it

nodeSelector:
  eks.amazonaws.com/nodegroup: gpu-linux-workers

Hi @pineapple9011
Sorry for delayed response. Just wanted to check if you are still facing the issue?

Thanks