I’ve been trying to optimize my kernel, and started bringing in the video instructions tonight. These are two-operation integer instructions. There is multiply-and-add and shift-and-add. They seem great, as these things otherwise take two normal instructions in compiled code. Is there any reason not to use the video instructions? Do they have lower throughput? I haven’t benchmarked the kernels with them in, but they do shave off instructions, as given by cuobjdump. If they aren’t lower throughput then I can’t imagine why the assembler doesn’t use them.