Subject: TAO Toolkit API 6.0.0 Deployment on Kubernetes → MongoDB Authentication Failed
Hardware:
-
2 × Quadro RTX 6000
-
Driver: 525.105.17
-
CUDA: 12.0
Network Type:
N/A – This issue occurs during TAO Toolkit API deployment (not related to training network).
TLT Version:
docker_tag: 6.0.0-pyt
(from Helm values.yaml → nvcr.io/nvidia/tao/tao-toolkit:6.0.0-pyt)
Training spec file:
N/A – issue occurs before any training jobs can be submitted.
How to Reproduce:
-
Followed official instructions for TAO Toolkit API deployment on bare-metal Kubernetes:
TAO Toolkit API Deployment – Bare Metal Setup -
Installed Helm chart (
tao-toolkit-api-6.0.0-multi-node.tgz) withbackend=local-k8sandhostPlatform=local. -
MongoDB StatefulSet comes up healthy (3 replicas running).
-
TAO API app/workflow pods fail to initialize due to MongoDB authentication errors.
Cluster / Environment Info:
-
Kubernetes Client: v1.34.1, Server: v1.34.0
-
OS: Ubuntu 18.04.6 LTS (
Linux ESIND-S2600WFT 5.4.0-150-generic)
Pods Status:
kubectl get pods
mongodb-0 1/1 Running
mongodb-1 1/1 Running
mongodb-2 1/1 Running
tao-api-app-pod-55d97f8f5b-4vwcm 0/1 Init:0/2 1397 restarts
tao-api-app-pod-649885dd69-4zkhb 0/1 Init:0/2 1398 restarts
tao-api-workflow-pod-7bb574b857-9vgm6 0/1 CrashLoopBackOff 2158 restarts
Error Logs from tao-api-app-pod (mongodb-init container):
2025-10-02 05:18:44,197 - handlers.mongo_handler - ERROR - Exception in __init__: name 'mongo_client' is not defined
2025-10-02 05:19:44,250 - __main__ - ERROR - Error initializing replicaset! Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}
2025-10-02 05:20:44,285 - __main__ - ERROR - Error initializing replicaset! Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}
Error Logs from tao-api-workflow-pod:
2025-10-02 05:21:05,410 - nvidia_tao_core.microservices.handlers.mongo_handler - ERROR - Exception in __init__: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}
2025-10-02 05:21:35,416 - nvidia_tao_core.microservices.handlers.mongo_handler - ERROR - Exception in __init__: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}
Helm Values (relevant snippets):
backend: local-k8s
hostPlatform: local
mongoOperatorEnabled: false
mongoDesiredReplicas: 3
imageMongo: mongo
My Questions:
-
When
mongoOperatorEnabled=false, do I need to manually configure MongoDB users/replica set authentication for TAO Toolkit API? -
Is there a specific secret (username/password) the TAO API expects for Mongo connection?
-
Or should the Helm chart handle Mongo initialization out-of-the-box?
Any guidance or working example values.yaml for local-k8s deployment without Mongo operator would be greatly appreciated.