AWS EKS - Tao Toolkit Install - ERROR: kubernetes cluster unreachable

Please provide the following information when requesting support.

• Hardware: AWS
image

• Network Type - Want to use LPD/LPRNet

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Followed steps here: Setup - NVIDIA Docs

I do this in a SageMaker Studio image terminal. I install ngc cli first for step 1.

For step 3 in Deployment section: Setup - NVIDIA Docs

  • I dont add anything into optional API params.

For step4:

  • I customise the set-up.sh file (attached) to remove sudo and other minor changes… like hardcoding terraform location.
    setup-no-sudo (1).txt (14.3 KB)

When i run the command to install the .sh file here are my inputs:
(image removed)

I used ssh-keygen for SSH public key
for API chart values - the file is empty:
image

I am met with this error:

What is the issue here? I’ve attached the main.tf file
main_tf.txt (4.8 KB)

In AWS - i see the cluster activate with 1 node. In S3 - i see a folder in the S3 bucket with cluster and config.

Can someone please provide some visbility into whats going on? I could not get the Python Wheels tao installation to work on SM Studio, and this isn’t working either. I’m keen to try out your models + tao toolkit but may need to move to something else.

Could you please share the full command and logs? Thanks.

hey Morgan - You helped me to get this to work on the other post - NVIDIA-TAO-Deploy -pycocotools-fix issue - Python Wheels TAO implementation porting from Collab to AWS SageMaker Studio - Intelligent Video Analytics / TAO Toolkit - NVIDIA Developer Forums

no longer required, for now. Thanks for your help

Got it. Thanks a lot for the info.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.