recently, I found that the computation cost for 64-bit data type (unsigned long) is much expensive than the 32-bit data type (unsigned int4). for example, just a map with a % operator, the 64-bit is nearly 10x slower than the 32-bit. moreover, for a radix sort, 64-bit is again around 4x slower. this looks like a little bit weird since the performance gap for the 32- and 64-bit on CPUs is not so large.
Thus I really wonder whether the 64-bit data type is natively supported on GPUs, or through some software simulation way? if anyone know the answer, please tell me, thanks in advance!!!