The instructions of VABSDIFF4 increases in CUDA9.2 (VOLTA)

Thank you so much everyone.

I understood that there is a high possibility of a bug in documentation of vabsdiff4.
I think consider replacing the inline assembler with a SIMD device function instrinsics.

Best regards.