Status of fp16 resnet-50 training?

What is the status of fp16 training of resnet on Volta GPUs using NGC containers? I’ve been told 5k images/second is achievable on DGX-1 volta, is this performance

  1. achievable using TensorFlow from NVidia NGC container
  2. available for both training and inference
  3. usable for actual training (ie, train in fp16 end-to-end without significant accuracy loss)

The NGC containers are based on the same development work as the DGX-1 containers. As to your questions:

  1. If you were to run the NGC container on a DGX-1 performance should be similar. Performance will vary based on your actual system architecture.
  2. Yes, but see #1 for caveats.
  3. Yes, you should be able to used mixed-precision training and not only get improved training performance but also see training converge in the same number of epochs. Again, convergence may vary depending on what exactly you are doing. However, for many models, you should see similar training accuracy. See the following link for details:

This blog entry references a couple of documents about accuracy with mixed-precision training that should explain in more detail what you can expect using FP16 and Tensorcores.