Hi. I’m looking for some feedback about the question I posted here:
I know NVIDIA’s apex library brings a lot of features to help training, but it’s also specified the inferences will run faster with this library.
My question is: Can I use apex for inferences only without re-training the FP32 model with Mixed Precision? Is there a real improvement by doing so in comparison with just casting my model to half precision?