How to know the clock cycles of each functional unit operation in Ampere architecture?

Hi, I am wondering where I can find the documentation of the functional unit hardware in Ampere architecture? For example, how many clock cycles do I need if I do one FP64 operation on an FP64 functional unit in the Ampere architecture GPU? If there is no exact documentation, what’s the best way to benchmark an approximate value for it?

Moving this to the GPU Hardware category.