Sorry in advance if this is not the most suitable place for questions related to ACE. I previously posted in Omniverse section of the forum and was told that it is not the right place to post questions about ACE.
I successfully deployed the Tokkio demo on my own instances on AWS, and now I am trying to replace the default avatar. I followed the instructions here and here.
In particular, I created and exported new scene using Avatar Studio, uploaded it to my private NGC Registry, and updated the ACE/workflows/tokkio/qsr/tokkio-app-params.yaml file to point to my private NGC registry:
Finally, I rebuilt the Tokkio app with ucf_app_builder_cli app build tokkio-app.yaml tokkio-app-params.yaml and deployed the app with the tokkio-deploy script. I used the same deployment template (deploy-template.yml) as for the unmodified Tokkio app. The end result is that the Tokkio app ends up running, but does not load my custom avatar. Instead it still loads the default avatar.
I also tried to push the Helm Charts for my local build of Tokkio to NGC registry (ngc registry chart push --org <redacted> --team <redacted> <redacted>/<redacted>/ucs-tokkio-audio-video-app:4.1.0) and modified the deployment template file to point to these private registry charts:
This does not work either. This time, the app fails to load any avatar or scene at all.
Is it possible that the Kubernetes container running on the EC2 instance fail to fetch the correct custom scenes from the private registry because itβs missing the NGC API key? I did provide the key in tokkio/scripts/one-click/aws/secrets.sh, but is it enough? If not, what other step do I need to take?
Follow up:
I managed to SSH into the EC2 instance where the app is deployed and I can confirm (by doing kubectl get secrets/ngc_api_key -o yaml and subsequently decoding the output) that the NGC API token is set correctly.
Also, if I do NOT set api_settings->chart_org and api_settings->chart_team inside deploy-template.yml, then by running kubectl describe pod ia-animation-graph-microservice-deployment-0 on the EC2 instance, I see that the pod tries to download nvidia/ucs-ms/default-avatar-scene:1.0.0, which explains why the app still uses the default Avatar instead of the custom one in my private registry.
On the other hand, by setting api_settings->chart_org and api_settings->chart_team to point to my private org and team, the same kubectl command displays:
So now it is trying to download the correct custom avatar resources from my private NGC registry, but fails and the corresponding pods (ia-animation-graph-microservice-deployment-0, ia-omniverse-renderer-microservice-deployment-0, etc.) are stuck in a Init:CrashLoopBackOff state.
If I do kubectl logs ia-animation-graph-microservice-deployment-0 --all-containers to inspect the logs of the pod that fails to start, it says:
Error: 'org_id_redacted/team_id_redacted/avatar_scene:1.0.0' could not be found.
Error from server (BadRequest): container "ms" in pod "ia-animation-graph-microservice-deployment-0" is waiting to start: PodInitializing
You seem like one of the few in the forum to actually get the reference app installed. I was able to set up all the infrastructure components (in Azure), and package the reference app up with ucf_app_builder_cli app build tokkio-app.yaml tokkio-app-params.yaml but I am not sure what i need to replace in the tokkio-deploy script to get it to deploy my ucs-tokkio-audio-video-4.1.0.tgz file to my bastion VM. Is it something you changed in the deployment-template.yml file?
Hi @kent25
The Tokkio deployment documentation from Nvidia omitted an important step. After you run ucf_app_builder_cli and helm package, you need to upload the resulting zip file to your private NGC registry. You probably need to have access to Nvidia Enterprise AI (trial is ok) to be able to create an NGC registry.
Once thatβs done, you need to add at least api_settings.chart_org and api_settings.chart_team to your deploy_template.yml, like this:
Hi @dalei ,
The error Error: 'org_id_redacted/team_id_redacted/avatar_scene:1.0.0' could not be found. usually happens when either resource is not there or is not accessible.
Could you try below?
Double check the resource available in Private NGC registry
org name and team name are matching in your overrides
Try downloading the resource using ngc cli.
You can do this on any machine with ngc cli on it. Or if you have the CSP setup still available you could use AWS EC2(one with GPUs) to do the same.
export NGC_CLI_API_KEY="<your api key here>"
ngc registry resource download-version <resource full path:version> --org <org-name>
e.g.
ngc registry resource download-version nvidia/ucs-ms/default-avatar-scene:1.0.0 --org nvidia
You should be able to see the result something like this
Hi @sarathm,
I can confirm that the org id and team id are correct. I performed the ngc registry resource download-version test and I am able to successfully re-download the avatar-scene I uploaded to my private registry. Digging into this a bit deeper, I now suspect that the problem lies with the nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1 container from NVidia.
By running kubectl describe pods ia-animation-graph-microservice-deployment-0, I understand that nvcr.io/eevaigoeixww/animation/ngc-resource-downloader:1.0.1 is the init container for this pod, and it executes a shell script named download_resource.sh:
If I take a peek into the download_resource.sh, I see these lines of code responsible for downloading the asset:
# Unset the NGC_CLI_API_KEY since we cannot download NGC resources from a NGC catalog when this is environment variable is set through the secret.
unset NGC_CLI_API_KEY
# Configure NGC CLI and download asset
ngc --version
# ngc config set --format_type ascii --org ${NGC_RESOURCE_PATH_SPLIT[0]} --team ${NGC_RESOURCE_PATH_SPLIT[1]}
ngc config set --format_type ascii
ngc config current
ngc registry resource download-version "$REMOTE_RESOURCE_PATH"
I see two problems with this code:
The line that sets the org id and team id is commented out.
There is no code to set the NGC API key to give access to a private registry
In essence, in its current state, the NGC automatic resource downloader will only work for resources under the default nvidia/ucs-ms/ public registry. Is this intentional? If not, could you contact the relevant people so they can push out a patch? Thanks.
I created my own resource downloader and downloaded the asset from a private repository. I customized the Avatar using the latest Avatar Studio and published it to my private resource on NGC. The problem was in the default downloader because it was built on an old version and did not download the latest version of NGC CLI. I customized the Avatar on a newer version (NGC CLI 3.51.0).