Counting Floating Point Operations with nvprof

If I know the number of predicated threads. For example, M threads will run in kernel X.

Is it possible to count FLOPS of kernel X seeing the assembly code of the kernel and then multiplying it by M?

In other words, make an estimation of FLOPS from kernel X if M takes a certain value.

How can I get the assembly code of a CUDA kernel?