Binary Operation in CUDA

I have a question about how to efficiently code binary operations in CUDA. I noted in the AES example in GPU Gems 3 that there were some improvements in PTX 1.3. I don’t really want to buy the book for just one example so I haven’t seen any example code there. Can anyone tell me please what the improvements are, and perhaps some sample code on how I should be optimizing binary operations in CUDA?


GPU Gems 3 was written way before compute capability 1.3, so I think there is nothing in there about improvements in ptx1.3

Anyhow,…ms3_part01.html suggests that it will be available online. I have a copy at work, so if you send me a pm on monday, I can check what is in there.