CUDA C introduced a way to avoid bank conflict with double precision data.
__shared__ int shared_low; __shared__ int shared_hi;
using the following functions
__double2loint() __double2hiint() __hiloint2double()
Is there a better way in CUDA Fortran, and/or do these functions implemented in CUDA Fortran?
If possible, could someone provide me a sample code.