Hi thanks for your response , but I had changed the numGpus already but I still can’t use two gpu to run parallel training.
The attachment is the GPU usage I monitored when executing the training task and the kubectl pod description.
Is there any other config I should changed to used multi GPU run a single training task?
are you running the getting_started_v4.0.0/notebooks/tao_api_starter_kit/api/automl/classification.ipynb or getting_started_v4.0.0/notebooks/tao_api_starter_kit/api/end2end/multiclass_classification.ipynb?
currently the end2end classification noteboook cannot support multi-gpu, and this feature will be enabled in TAO 5.0 soon.
You can try the autoML notebook which can launch training task with multi-gpu and give a better performance model with automatically selected hyperparameters.