RuntimeError: NCCL Error 3: internal error - please report this issue to the NCCL developers

Hi, I’m running Fine-tune Llama 2 with LoRA for Question Answering in a Standard NC80adis H100 v5 (80 vcpus, 640 GiB memory) VM on Azure (The VM has 2 NVIDIA H100) but I get RuntimeError: NCCL Error 3: internal error - please report this issue to the NCCL developers during the training process fine_tuning.train(). I have the NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 installed in my VM. While debugging I can see NCCL version 2.19.3+cuda12.3 . Tried troubleshooting the issue via building NVIDIA/nccl from source code but didn’t work, as well tried the build from the official Nvidia website but no luck. Any hints on how this can be fixed… Thanks in advance
NB: Running the same script in a Standard NC40ads H100 v5 (40 vcpus, 320 GiB memory) instance 1 NVIDIA H100 runs successfully