TAO API - bare metal install - Connection Refused after TAO API re-install

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) AMD64, Ubuntu 20.04, TAO 4.0.2 Bare Metal API install
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I installed TAO API bare metal install and everything was working (with adjustments for known connection issue). After discovering a data problem and needing my GPU, uninstalled TAO API to resolve issues. Have since reinstalled TAO API bare metal. The (second) install went fine (note

PLAY RECAP ********************************************************************************************************************************************************************************************************************************************************************************************************************************                  : ok=25   changed=15   unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   
localhost                  : ok=10   changed=5    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0   

another side (recommendation) note, add FIXME to object_detection.ipynb in the convert_index cell. I missed the need to edit class mappings and I could have avoided this problem had I configured my job correctly.

if convert_action == "convert_and_index":
    # FIXME - check class mapping
    #Change this to the classes your dataset has
    specs["target_class_mapping"] = [   {"key":"pedestrian","value":"pedestrian"},

after the install, I verified stuff:

hostname -i

kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'

kubectl get pods
NAME                                               READY   STATUS    RESTARTS   AGE
ingress-nginx-controller-5ff6555d5d-hbbs4          1/1     Running   0          3m
nfs-subdir-external-provisioner-5f9cbb4554-9pfxl   1/1     Running   0          2m55s
nvidia-smi-5950x                                   1/1     Running   0          2m52s
tao-toolkit-api-app-pod-54c9c75fbc-brrzp           1/1     Running   0          2m50s
tao-toolkit-api-workflow-pod-55b9bfc948-dndxz      1/1     Running   0          2m50s

kubectl get services
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller   NodePort    <none>        80:32080/TCP,443:32443/TCP   3m
kubernetes                 ClusterIP        <none>        443/TCP                      6m49s
tao-toolkit-api-service    ClusterIP   <none>        8000/TCP                     2m50s

Cluster & associated services seem fine.
From my Notebook - basically the original but I changed some print statements to help me troubleshoot

Setting up Connection

(adjusted per known connection issue)

response = requests.get(f"{host_url}/api/v1/login/{ngc_api_key}")
print (f"response: {response}")
user_id = str(uuid.uuid4())
print (f"HOST generated userid: {user_id}")
token = "whatever"
print (f"token doesn't matter: {token}")

# set base URL
base_url = f"{user_id}"
headers = {"Authorization": f"Bearer {token}"}
print (f"API Calls will be forwarded to: {base_url}")
print (f"headers: {headers}")



response: <Response [401]>
HOST generated userid: 4b6fb64c-5a26-4aef-ad3c-650a2d8220fb
token doesn't matter: whatever

API Calls will be forwarded to:[]

headers: {'Authorization': 'Bearer whatever'}

First TAO API Operation

# Create train dataset
# response 201 == success!
data = json.dumps({"type":ds_type,"format":ds_format})
endpoint = f"{base_url}/dataset"

print (f"endpoint: {endpoint}")
print (f"data: {data}")
print (f"headers: {headers}")

response = requests.post(endpoint,data=data,headers=headers)
dataset_id = response.json()["id"]
print (f'dataset_id: {dataset_id}')


data: {"type": "object_detection", "format": "kitti"}
headers: {'Authorization': 'Bearer whatever'}

w/ (since the install referenced
ConnectionError: HTTPConnectionPool(host=‘’, port=31951): Max retries exceeded with url: /api/v1/user/e8990b89-013b-42b5-8fe3-15e1f702275d/dataset (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f5dcc1d7748>: Failed to establish a new connection: [Errno 111] Connection refused’,))

I also tried to match original instructions; same result

I also tried in a browser:

connection refused in both cases

empty braces (as expected, a good thing?)

So basically, my cluster appears okay but I can’t connect. What did I miss? Thanks for all of your help.

This is expected which is mentioned in Tao Toolkit API cannot login and got 401 unauthorized.


Could you double check the steps mentioned in above workaround topic?

Thanks again for your help on this.
I think I may have forgotten to edit the service per the workaround after the re-install…

I should have reverified:

kubectl get services
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller   NodePort   <none>        80:32080/TCP,443:32443/TCP   75m
kubernetes                 ClusterIP        <none>        443/TCP                      79m
tao-toolkit-api-service    NodePort    <none>        8000:31951/TCP               75m

I definitely transposed two port digits. fixed and now I’m attached correctly

I also should have noted for the next person,
When I reinstalled (bash setup.sh install), the cosole referenced:
(the first install, it referenced

consequently, I changed base_url = f’{user_id}
(not – this worked fine

Thanks for the info. Glad to know it is working now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.