How to know the clock cycles of each functional unit operation in Ampere architecture?

Hi, I am wondering where I can find the documentation of the functional unit hardware in Ampere architecture? For example, how many clock cycles do I need if I do one FP64 operation on an FP64 functional unit in the Ampere architecture GPU? If there is no exact documentation, what’s the best way to benchmark an approximate value for it?

