mul24/mad24 with 16-bit operands and half register access...

rostam · July 17, 2008, 6:13pm

I’d like to replace a “mul” instruction in my code with a single “mul24” and another “mad24” to reduce the execution time from 16 to 8 cycles (the math works b/c of small range of one of the operands).

Q1) Surprisingly, mul24/mad24 do not seem to allow 16-bit operands (.type = { .u32, .s32 } mentioned in Tables 22 and 23 of PTX ISA reference) unlike mul/mad (where .itype = { .u16, .u32, .u64, .s16, .s32, .s64 } in Tables 20 and 21).
Why is it like that? I had the impression that mul is handled by the mul24 HW engine through series of calls. So it doesn’t make sense that .u16/.s16 is supported for mul/mad but not for mul24/mad24, right?

Q2) Is it possible to access upper or lower half of a full register through half registers, e.g., %rh11 to reference the upper half of %r5 (remember ah, al, ax, eax in X68)?
Again for performance reason, I’d like to avoid “cvt” or other instructions.

thanks