Issue about no (suitable) GPUs detected when using mixed precision graph optimizer

I found a very strange issue about precision graph optimizer.
My container is tf19.04-py3
GPUs are several Titan Xs
CUDA 10
nvidia-driver 418.67

I type the following codes to test mixed precision graph optimizer:

import os
os.environ[“TF_ENABLE_AUTO_MIXED_PRECISION”] = “1”
import tensorflow as tf
sess = tf.Session()
a = tf.constant(10)
b = tf.constant(12)
sess.run(a+b)
22

sess.run(a+b)
2019-05-27 21:49:18.757578: W tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1930] No (suitable) GPUs detected, skipping auto_mixed_precision graph optimizer
22

I’m very curious why it can’t detect suitable GPUs when I execute sess.run(a+b) for the SECOND time.

By the way, When I assign GPU resources at the very beginning:

os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”
os.environ[“CUDA_VISIBLE_DEVICES”] = “1,2,3,4”

It will log the same error when I execute sess.run(a+b) for the FIRST time.

I’m wondering how I can use mixed precision graph optimizer correctly.

The auto mixed precision graph optimizer is only designed for GPUs of Volta generation (SM 7.0) or later, and if no such GPUs are detected (Titan X is pre-Volta) then it will print the message you see. This behavior can be overridden for testing purposes by setting the environment variable TF_AUTO_MIXED_PRECISION_GRAPH_REWRITE_IGNORE_PERFORMANCE=1, but the result may be a net loss in performance.

It is likely that the graph optimizer was not invoked at all the first time, but I’m not sure why that was the case. Perhaps it was related to the number of nodes in the graph; does it reproduce if you do “c = a + b; sess.run©; sess.run©”?

Thanks a lot for your reply!!! By the way, could you give me some advice about how to optimize my training in your tf container? I’m fine-tuning a bert net.