How to dynamically switch and measure usage of floating point precision

Is there a way to find out if a device kernel is running in 16, 32, or 64 bit floating point mode during runtime or perhaps running in mixed precision mode? Can we force device code to run with a specified floating point precision programmatically during runtime?

There are no precision modes. As a corollary, you cannot change precision at runtime. Each arithmetic instruction operates with a particular precision (half, single, double). See the flop_count* metrics of the CUDA profiler to get a sense how much computation occurs in each of the available precisions when the application runs.