Are there plans to implement the __int128_t and __uint128_t in the nvcc compiler for cuda? The gcc compiler successfully does this for the X86_64 architecture and I have noticed that the SM_20 architecture supports longs and unsigned longs, both 64 bits. What is the status here?
If you would like to see this in CUDA, I would suggest filing an RFE (request for enhancement) through the bug reporting mechanism. Please note that GPUs so far are based on a 32-bit architecture with a few extensions, so this would be more like 128-bit support on plain old x86, which I believe gcc does not offer. In case you just need addition and multiplication, check out the code I posted on StackOverflow:
http://stackoverflow.com/questions/6162140/128-bit-integer-on-cuda/6220499#6220499