Training in fp16 without gradients vanishing

I’ve been working on a convolutional neural network using cudnn and cublas and i’ve hit a seeming unsolvable problem where i must use fp16 to use the tensor cores but the gradients instantly vanish, particularly at the connection between the last convolutional layer and the first fully connected layer where the loss gets divided by the number of parameters.
How does one perform the backward pass using the fp16 tensor cores in a Titan V while avoiding vanishing gradients?

It doesnt matter what i do either the data shoots off into infinity or the gradients vanish and it always happens in a logarithmic curve and there doesnt seem to be any way to stabilize it!
I’ve been messing with different activations, alpha and beta values optimizers, gradient clipping and scaling but nothing works!
Its like trying to balance a marble on the sharp end of a pin!

Why is nobody replying is this forum dead?