Error when training with multiple GPUs in TAO

I’m pretty confused right now. I proceeded step by step as follows:

  1. I followed the guide here using the “Launcher CLI” TAO Toolkit Quick Start Guide - NVIDIA Docs
  2. After that I start jupyter notebook --no-browser --port=8080 --allow-root
  3. Since I’m accessing remotely, I open a new terminal with the command: ssh -L 8080:localhost:8080 user@192.168.188.25
  4. I run all the steps as described in: yolo_v4.ipynb
  5. I get the error when trying to trining with more than 1 GPU

When I follow your approach and want to start jupyter in the docker I get the error “Connection refused”. So I mouted my previous project in docker and ran your training command. This works with 1GPU but not with 8 Error when training with multiple GPUs in TAO - #2 by Morganh

I get completely different error messages and it doesn’t work at all: Error when training with multiple GPUs in TAO - #6 by Morganh

I’m sorry but can you please explain the steps in a bit more detail?