I have a question about how to efficiently code binary operations in CUDA. I noted in the AES example in GPU Gems 3 that there were some improvements in PTX 1.3. I don’t really want to buy the book for just one example so I haven’t seen any example code there. Can anyone tell me please what the improvements are, and perhaps some sample code on how I should be optimizing binary operations in CUDA?