TAO API - Detectnet_v2 - Multi GPU Stuck

I try to follow the steps of this other posts:

But get stuck once started. I don’t know how many time need to finish. I’m wating more than 15 minutes and no movement or results.
The GPUs are lock at 100% of the clock frequency.
Also attach the nvidia-smi:

nvidia-smi
root@9ea20d6ac6f2:/workspace/nccl-tests# nvidia-smi
Fri May 26 11:20:48 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX 6000...  Off  | 00000000:21:00.0 Off |                  Off |
| 32%   54C    P8    32W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX 6000...  Off  | 00000000:22:00.0 Off |                  Off |
| 37%   58C    P8    41W / 300W |      0MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Attach the log:

root@9ea20d6ac6f2:/workspace/nccl-tests# ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 2
# nThread 1 nGpus 2 minBytes 8 maxBytes 134217728 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid   1034 on 9ea20d6ac6f2 device  0 [0x21] NVIDIA RTX 6000 Ada Generation
#  Rank  1 Group  0 Pid   1034 on 9ea20d6ac6f2 device  1 [0x22] NVIDIA RTX 6000 Ada Generation
9ea20d6ac6f2:1034:1034 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
9ea20d6ac6f2:1034:1034 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v6 symbol.
9ea20d6ac6f2:1034:1034 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin (v5)
9ea20d6ac6f2:1034:1034 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
9ea20d6ac6f2:1034:1034 [0] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v5)
9ea20d6ac6f2:1034:1034 [1] NCCL INFO cudaDriverVersion 12000
NCCL version 2.15.5+cuda11.8
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
9ea20d6ac6f2:1034:1043 [0] NCCL INFO P2P plugin IBext
9ea20d6ac6f2:1034:1043 [0] NCCL INFO NET/IB : No device found.
9ea20d6ac6f2:1034:1043 [0] NCCL INFO NET/IB : No device found.
9ea20d6ac6f2:1034:1043 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Using network Socket
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Using network Socket
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 00/04 :    0   1
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 01/04 :    0   1
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] 0/-1/-1->1->-1 [2] -1/-1/-1->1->0 [3] 0/-1/-1->1->-1
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 02/04 :    0   1
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 03/04 :    0   1
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] -1/-1/-1->0->1 [2] 1/-1/-1->0->-1 [3] -1/-1/-1->0->1
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Channel 00/0 : 1[22000] -> 0[21000] via P2P/direct pointer
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 00/0 : 0[21000] -> 1[22000] via P2P/direct pointer
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Channel 01/0 : 1[22000] -> 0[21000] via P2P/direct pointer
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 01/0 : 0[21000] -> 1[22000] via P2P/direct pointer
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Channel 02/0 : 1[22000] -> 0[21000] via P2P/direct pointer
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 02/0 : 0[21000] -> 1[22000] via P2P/direct pointer
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Channel 03/0 : 1[22000] -> 0[21000] via P2P/direct pointer
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Channel 03/0 : 0[21000] -> 1[22000] via P2P/direct pointer
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Connected all rings
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Connected all rings
9ea20d6ac6f2:1034:1044 [1] NCCL INFO Connected all trees
9ea20d6ac6f2:1034:1044 [1] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
9ea20d6ac6f2:1034:1044 [1] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
9ea20d6ac6f2:1034:1043 [0] NCCL INFO Connected all trees
9ea20d6ac6f2:1034:1043 [0] NCCL INFO threadThresholds 8/8/64 | 16/8/64 | 512 | 512
9ea20d6ac6f2:1034:1043 [0] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
9ea20d6ac6f2:1034:1044 [1] NCCL INFO comm 0x556ce5af0be0 rank 1 nranks 2 cudaDev 1 busId 22000 - Init COMPLETE
9ea20d6ac6f2:1034:1043 [0] NCCL INFO comm 0x556ce5aee150 rank 0 nranks 2 cudaDev 0 busId 21000 - Init COMPLETE
#
#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)