Don't see any transfers on NVLINK with NCCL all_sum on p3.8xlarge

Here’s the code:

from tensorflow.contrib.nccl import all_sum 

with tf.device('/gpu:0'):
        a = tf.get_variable(
            "a", initializer=tf.constant(1.0, shape=(args.dim, args.dim)))

with tf.device('/gpu:1'):
        b = tf.get_variable(
            "b", initializer=tf.constant(2.0, shape=(args.dim, args.dim)))

with tf.device('/gpu:0'):
        summed_node = all_sum([a, b])
         sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True,

init = tf.global_variables_initializer()

with tf.device('/gpu:0'):
        summed =

My machine is an AWS instance of p3.8xlarge. My understanding is, this configuration supports NVLINK.

The execution is fine but when I use

nvidia-smi nvlink -g 0 -i 0

the link Tx/Rx counts are zero.

NVDA, kindly look into this, would be grateful. Thank you.