Tesla V100 Performance (AWS-P3X16)

Greetings all,

We’re testing with AWS P3X16, TensorFlow, and ImageNet ILSVRC2012 using RESNET50. The system is Ubuntu with NGC 18.05-py3 using both NFS and EBS IO1 mount, and what we’re getting compared to our on-prem DGX1 is short of laughable. With the DGX1 we’re getting about 4200/s and 4300/sec depending if we use RESNET50 or RESNET50_V2 respectively, we’re not even close to these numbers with the V100 GPUs.

Our numbers max at ~2400/s whether using 4 or 8GPUs, this with EBS or NFS mount. We have tried every TF argument without any real difference, it’s truly puzzling why we can’t leverage the 8GPUs.

python3 tf_cnn_benchmarks.py --data_dir=/ml-demo-cv/image-2012/tfrecords --data_format=NCHW --batch_size=64
–num_batches=100 --model=resnet50 --optimizer=sgd --variable_update=replicated --use_fp16=True --nodistortions
–gpu_thread_mode=gpu_shared --gradient_repacking=2 --datasets_use_prefetch=False
–num_gpus=8 --data_name=imagenet

Done warm up
Step Img/sec total_loss
1 images/sec: 2551.4 +/- 0.0 (jitter = 0.0) 8.289
10 images/sec: 2544.3 +/- 10.8 (jitter = 23.0) 8.265
20 images/sec: 2547.8 +/- 25.1 (jitter = 58.5) 8.279
30 images/sec: 2598.3 +/- 22.0 (jitter = 89.9) 8.174
40 images/sec: 2625.9 +/- 19.1 (jitter = 129.6) 8.154
50 images/sec: 2570.7 +/- 23.4 (jitter = 145.6) 8.151
60 images/sec: 2532.9 +/- 23.1 (jitter = 201.5) 8.187
70 images/sec: 2496.6 +/- 22.6 (jitter = 250.1) 8.119
80 images/sec: 2472.6 +/- 21.6 (jitter = 267.8) 8.146
90 images/sec: 2463.4 +/- 20.2 (jitter = 257.2) 8.154
100 images/sec: 2437.5 +/- 20.5 (jitter = 266.8) 8.094

total images/sec: 2436.42

Looking for any feedback.

Thank you,

Nabil