Tokkio Avatar customization

Sorry in advance if this is not the most suitable place for questions related to ACE. I previously posted in Omniverse section of the forum and was told that it is not the right place to post questions about ACE.

I successfully deployed the Tokkio demo on my own instances on AWS, and now I am trying to replace the default avatar. I followed the instructions here and here.

In particular, I created and exported new scene using Avatar Studio, uploaded it to my private NGC Registry, and updated the ACE/workflows/tokkio/qsr/tokkio-app-params.yaml file to point to my private NGC registry:

animation-graph:
  ...
  resourceDownload:
    remoteResourcePath: "org_id_redacted/team_redacted/avatar_scene:1.0.0"
    secretName: ngc-api-key-secret
    image: nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1
avatar-renderer:
  ...
  resourceDownload:
    remoteResourcePath: "org_id_redacted/team_redacted/avatar_scene:1.0.0"
    secretName: ngc-api-key-secret
    image: nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1

Finally, I rebuilt the Tokkio app with ucf_app_builder_cli app build tokkio-app.yaml tokkio-app-params.yaml and deployed the app with the tokkio-deploy script. I used the same deployment template (deploy-template.yml) as for the unmodified Tokkio app. The end result is that the Tokkio app ends up running, but does not load my custom avatar. Instead it still loads the default avatar.
I also tried to push the Helm Charts for my local build of Tokkio to NGC registry (ngc registry chart push --org <redacted> --team <redacted> <redacted>/<redacted>/ucs-tokkio-audio-video-app:4.1.0) and modified the deployment template file to point to these private registry charts:

  api_settings:
    openai_api_key: '${_openai_api_key}'
    chart_org: '<redacted>'
    chart_team: '<redacted>'

This does not work either. This time, the app fails to load any avatar or scene at all.
Is it possible that the Kubernetes container running on the EC2 instance fail to fetch the correct custom scenes from the private registry because it’s missing the NGC API key? I did provide the key in tokkio/scripts/one-click/aws/secrets.sh, but is it enough? If not, what other step do I need to take?

Follow up:
I managed to SSH into the EC2 instance where the app is deployed and I can confirm (by doing kubectl get secrets/ngc_api_key -o yaml and subsequently decoding the output) that the NGC API token is set correctly.

Also, if I do NOT set api_settings->chart_org and api_settings->chart_team inside deploy-template.yml, then by running kubectl describe pod ia-animation-graph-microservice-deployment-0 on the EC2 instance, I see that the pod tries to download nvidia/ucs-ms/default-avatar-scene:1.0.0, which explains why the app still uses the default Avatar instead of the custom one in my private registry.

On the other hand, by setting api_settings->chart_org and api_settings->chart_team to point to my private org and team, the same kubectl command displays:

    Environment:
      REMOTE_RESOURCE_PATH:       org_id_redacted/team_id_redacted/avatar_scene:1.0.0

So now it is trying to download the correct custom avatar resources from my private NGC registry, but fails and the corresponding pods (ia-animation-graph-microservice-deployment-0, ia-omniverse-renderer-microservice-deployment-0, etc.) are stuck in a Init:CrashLoopBackOff state.

If I do kubectl logs ia-animation-graph-microservice-deployment-0 --all-containers to inspect the logs of the pod that fails to start, it says:

Error: 'org_id_redacted/team_id_redacted/avatar_scene:1.0.0' could not be found.
Error from server (BadRequest): container "ms" in pod "ia-animation-graph-microservice-deployment-0" is waiting to start: PodInitializing

You seem like one of the few in the forum to actually get the reference app installed. I was able to set up all the infrastructure components (in Azure), and package the reference app up with ucf_app_builder_cli app build tokkio-app.yaml tokkio-app-params.yaml but I am not sure what i need to replace in the tokkio-deploy script to get it to deploy my ucs-tokkio-audio-video-4.1.0.tgz file to my bastion VM. Is it something you changed in the deployment-template.yml file?

Hi @kent25
The Tokkio deployment documentation from Nvidia omitted an important step. After you run ucf_app_builder_cli and helm package, you need to upload the resulting zip file to your private NGC registry. You probably need to have access to Nvidia Enterprise AI (trial is ok) to be able to create an NGC registry.
Once that’s done, you need to add at least api_settings.chart_org and api_settings.chart_team to your deploy_template.yml, like this:

  api_settings:
    openai_api_key: '${_openai_api_key}'
    chart_org: '<your org id>'
    chart_team: '<your team id>'

The org id is assigned to you and you can get it from NGC web portal and you need to create your team id.

1 Like

Hi @dalei ,
The error Error: 'org_id_redacted/team_id_redacted/avatar_scene:1.0.0' could not be found. usually happens when either resource is not there or is not accessible.
Could you try below?

  1. Double check the resource available in Private NGC registry
  2. org name and team name are matching in your overrides
  3. Try downloading the resource using ngc cli.
    You can do this on any machine with ngc cli on it. Or if you have the CSP setup still available you could use AWS EC2(one with GPUs) to do the same.
export NGC_CLI_API_KEY="<your api key here>"
ngc registry resource download-version  <resource full path:version> --org <org-name>
e.g. 
ngc registry resource download-version  nvidia/ucs-ms/default-avatar-scene:1.0.0 --org nvidia

You should be able to see the result something like this

Getting files to download...
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ β€’ 1.4/1.4 GiB β€’ Remaining: 0:00:00 β€’ 40.9 MB/s β€’ Elapsed: 0:00:29 β€’ Total: 817 - Completed: 817 - Failed: 0

-------------------------------------------------------------------------------
   Download status: COMPLETED
   Downloaded local path resource: /home/home_dir_redacted/default-avatar-scene_v1.0.0
   Total files downloaded: 817
   Total transferred: 1.35 GB
   Started at: 2024-09-10 23:36:05
   Completed at: 2024-09-10 23:36:35
   Duration taken: 30s
-------------------------------------------------------------------------------

A very important step indeed. Thanks for that solve.

Hi @sarathm,
I can confirm that the org id and team id are correct. I performed the ngc registry resource download-version test and I am able to successfully re-download the avatar-scene I uploaded to my private registry. Digging into this a bit deeper, I now suspect that the problem lies with the nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1 container from NVidia.

By running kubectl describe pods ia-animation-graph-microservice-deployment-0, I understand that nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1 is the init container for this pod, and it executes a shell script named download_resource.sh:

Controlled By:  StatefulSet/ia-animation-graph-microservice-deployment
Init Containers:
  init:
    Container ID:  containerd://42bcc4bc088fa63c8b927cb6be161180e028e0015479254f492e62e347b36f07
    Image:         nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1
    Image ID:      nvcr.io/eevaigoeixww/animation/ngc-resource-downloader@sha256:dbfd2b4c4f156169d93cb0e7be9a7229ff690f8a8f098dde3450b1353376b857
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      download_resource.sh

If I take a peek into the download_resource.sh, I see these lines of code responsible for downloading the asset:

# Unset the NGC_CLI_API_KEY since we cannot download NGC resources from a NGC catalog when this is environment variable is set through the secret.
unset NGC_CLI_API_KEY

# Configure NGC CLI and download asset
ngc --version
# ngc config set --format_type ascii --org ${NGC_RESOURCE_PATH_SPLIT[0]} --team ${NGC_RESOURCE_PATH_SPLIT[1]}
ngc config set --format_type ascii
ngc config current
ngc registry resource download-version "$REMOTE_RESOURCE_PATH"

I see two problems with this code:

  1. The line that sets the org id and team id is commented out.
  2. There is no code to set the NGC API key to give access to a private registry

In essence, in its current state, the NGC automatic resource downloader will only work for resources under the default nvidia/ucs-ms/ public registry. Is this intentional? If not, could you contact the relevant people so they can push out a patch? Thanks.

Hey @dalei - I’m facing the same issue. Did you get any breakthrough on it?

Hi @dalei1
There will be a fix to this in next version. However, there is an alternative for this by using β€œManual Resource Downloader”.
Steps are detailed here Resource Downloader β€” ACE documentation latest documentation

Did you get a chance to try one of the alternative methods such as manual-resource-downloader? Resource Downloader β€” ACE documentation latest documentation

I created my own resource downloader and downloaded the asset from a private repository. I customized the Avatar using the latest Avatar Studio and published it to my private resource on NGC. The problem was in the default downloader because it was built on an old version and did not download the latest version of NGC CLI. I customized the Avatar on a newer version (NGC CLI 3.51.0).