just reading through some ptx code generated from decuda and i see .half a lot
examples:
rcp.half.f32 $r3, $r3
mul.half.rn.f32 $r10, $r5, $r3
mov.half.b32 $r3, $r4
does it mean it just operates on the lower 2 bytes? or does it do the full 32 bit operation but only returns 2 bytes
and w/ moving is it just transferring 2 bytes? am i correct to assume it defaults to the lower 2 bytes?