CUDA "fmsub" performance against negation+fma

Ok, first question solved then. Yes, fma is defined as fma(float, float, float). I guess that only 1 cast to float is the rest cast automatically to perform the addition and the multiplication.

How about the second one? Should I use fma or leave it as integer subtraction+multiplication?