These two multiply instructions are usually architecture dependent. How are they implemented in CUDA?
mullo(a,b) -> |_ (a x b) / 2^W _|
mulhi(a,b) -> (a x b) mod 2^W
where W is the processor’s word size, so 2 elevated to the power of W. And, |_ _| depicts floor function.
In fact, I am having trouble understanding the concept of mullo and mulhi. Can someone point out a place where can I read more about these type of instructions?