I’m trying to use the apex automatic mixed precision on an ensemble of 2 models connected serially.
I’m testing the opt_level = 02 and indeed I observe the input and output of the model internally converted to half precision.
However, after the forward step the input/output tensors used to calculate the loss are again converted to single precision.
I expect the data tensors and weights of the model to be half precision all the way, such that the optimizer works on 16 bit tensors. Is this the correct behavior?