AMP didn't convert any node to float16

mattvilain · July 19, 2019, 2:11pm

Hi,

I’m trying to use AMP on an Nvidia container (nvcr.io/nvidia/tensorflow:19.06-py2).

I made :

export TF_ENABLE_AUTO_MIXED_PRECISION=1

But when I tried with the given Nvidia-examples,w the auto_mixed_precision graph optimizer didn’t convert any node to float16.

2019-07-19 13:47:38.242470: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:38.243886: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:39.707156: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:39.708664: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:39.769731: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:39.770354: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:41.886982: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:41.888125: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:41.953587: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:41.954582: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:43.031187: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:43.032030: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:44.228865: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:44.229373: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:44.341795: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:44.343178: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:44.527141: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:44.528233: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
2019-07-19 13:47:44.621246: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:44.622308: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1248] No whitelist ops found, nothing to do
  Step Epoch Img/sec   Loss  LR
2019-07-19 13:47:45.004971: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 13:47:45.014521: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1741] Converted 0/738 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
2019-07-19 13:47:45.398637: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally

The exemple I chosed is googlenet.py but it didn’t work for any of the provided exemples.

nluehr · July 19, 2019, 2:37pm

The reason AMP does not convert any nodes in the cnn examples is that they are already configured to run in (explicitly implemented) mixed precision by default.

python googlenet.py

Step Epoch Img/sec Loss LR
1 1.0 36.8 6.909 7.370 0.04000
10 10.0 492.8 6.904 7.365 0.03240
20 20.0 1564.9 6.896 7.357 0.02489
30 30.0 1564.7 6.856 7.317 0.01838
40 40.0 1567.3 6.150 6.610 0.01284
50 50.0 1567.7 5.426 5.887 0.00830
60 60.0 1567.7 5.419 5.880 0.00475
70 70.0 1561.9 5.418 5.878 0.00218
80 80.0 1568.4 5.418 5.878 0.00060
90 90.0 1250.0 5.418 5.878 0.00000

There is an option to use an fp32 model instead. Notice the reduced performance.

python googlenet.py --precision=fp32

Step Epoch Img/sec Loss LR
1 1.0 40.1 6.907 7.368 0.04000
10 10.0 445.4 6.901 7.361 0.03240
20 20.0 1035.6 6.874 7.335 0.02489
30 30.0 1033.3 6.261 6.721 0.01838
40 40.0 1034.4 5.397 5.857 0.01284
50 50.0 1034.4 5.400 5.860 0.00830
60 60.0 1034.5 5.391 5.851 0.00475
70 70.0 1033.9 5.388 5.849 0.00218
80 80.0 1033.0 5.388 5.848 0.00060
90 90.0 893.3 5.388 5.848 0.00000

Now enabling AMP on the fp32 model triggers model conversion and restores perf to nearly that of the explicitly mixed precision model (for more computationally intensive models, the perf gap between AMP and explicit mixed precision shrinks even further).

TF_ENABLE_AUTO_MIXED_PRECISION=1 python googlenet.py --precision=fp32

Step Epoch Img/sec Loss LR
2019-07-19 14:26:22.547554: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 14:26:22.579250: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1741] Converted 429/3950 nodes to float16 precision using 2 cast(s) to float16 (excluding Const and Variable casts)
1 1.0 35.1 6.907 7.368 0.04000
2019-07-19 14:26:29.874465: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:2005] Running auto_mixed_precision graph optimizer
2019-07-19 14:26:29.905673: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1741] Converted 429/3921 nodes to float16 precision using 2 cast(s) to float16 (excluding Const and Variable casts)
10 10.0 443.2 6.901 7.362 0.03240
20 20.0 1451.1 6.882 7.343 0.02489
30 30.0 1454.3 6.498 6.959 0.01838
30 30.0 1454.2 6.317 6.778 0.01778
40 40.0 1454.2 5.436 5.897 0.01284
50 50.0 1454.1 5.403 5.864 0.00830
60 60.0 1452.9 5.402 5.863 0.00475
70 70.0 1449.5 5.402 5.863 0.00218
80 80.0 1456.8 5.402 5.863 0.00060
90 90.0 1169.3 5.402 5.863 0.00000