Does NCCL2.4 uses Hierarchical All-Reduce by default?

Does NCCL2.4 uses Hierarchical All-Reduce for multi-node multi-gpu all-reduce by default?

Hierarchical All-Reduce consists of three stages:

  1. Intra-node reduce
  2. Inter-node all-reduce
  3. Intra-node broadcast

Thanks a lot!