Would people from nVidia shed some light on 16bit and 32bit integer arithmethic performance? There is lots of talk about float point performance in the doc, but only a sentance or two on 16bit and 32bit arithmetic performance. I would like to know more about integer arithmetic latencies as most of my code involves 16bit and 32bit, both signed and unsigned operations.
Thanks.