Hi,
I understand that all GPUs are 32 bit as of today and there seems to be no need to switch to 64 bit in the near future. However there are some 64-bit supporting PTX functions available. For example the simple bitwise logical operation AND/OR/XOR are available for 32 bit data as well as for 64 bit data.
Is it save to assume that the 64 bit version is never slower that 2 calls to the 32 bit version?
Thanks.
The throughput of bitwise logical operations on 64-bit integers is not listed in the official table (CUDA C++ Programming Guide).
This means that the following should apply
Other instructions and functions are implemented on top of the native instructions. The implementation may be different for devices of different compute capabilities, and the number of native instructions after compilation may fluctuate with every compiler version.
According to the SASS on compiler explorer (Compiler Explorer) nvcc 12.3.1 uses 2 32-bit logical operations for 64-bit.
Other 64 bit operations can be significantly slower than their 32 bit version, most notably 64-bit floating point arithmetic on consumer gpus.
Hi,
thanks for your information. I was looking for a ‘general rule of thumb’ which one to use if you have the choice like:
xor.b64 res_ui64, x_ui64, y_ui64
vs
xor.b32 res_ui32_0, x_ui32_0, y_ui32_0
xor.b32 res_ui32_1, x_ui32_1, y_ui32_1
with ui64 → ( ui32_0 << 32 ) | ui32_1
Thanks.
That is PTX code which will be further compiled to optimized SASS code.
I would simply use the ptx instruction intended for the respective datatype, so xor.b64 for 64-bit values.
1 Like