• WSL 2 Ubuntu (20.04)
• CUDA 11.6
• Hardware (Asus TUF Dash 15/Geforce RTX 3060)
• Network Type (Yolo_v3 from cv_samples)
• TAO Version (3.22.02)
• NCCL Version (2.12.10)
I’m following “yolo_v3” notebook of CV samples from NVIDIA TAO tutorials page. At the end of the first epoch,
self._traceback = tf_stack.extract_stack()
error apears.
tao_stack_log.txt (62.9 KB)
On the log section, TAO toolkit, CUDA and NCCL version are not be detected properly. My setup consists CUDA 11.6, NCCL version 2.12 and TAO toolkit 3.22.02 but toolkit throws a log likeNCCL version 2.9.9+cuda11.3 and (nvcr . io / nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3) . Even though I update my NCCL toolkit manually, still does not see the correct version.
Besides that there is always two root errors which is Unknown: ncclCommInitRank failed: unhandled system error. I’ve uploaded the log file for understanding the problem well. Any help would be perfect for me. Thanks for your advices.
Best
Alper