There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
From topic Error when training with multiple GPUs in TAO , that user can run 8gpus well with 22.05 docker.
And also, as you mentioned above, the “With tensorrt docker and ran nccl inside it with nvidia driver 515, still the same issue”, I am afraid that the issue is related to the topo.