division in CUDA Fortran

I experience slow performance when using division operator in the kernel. May I ask how CUDA Fortran map a division to the native code, as in CUDA C there are different ways to do division (which work differently for integer, FP32, FP64…).

Thanks,
Tuan

Hi Tuan,

By default, we use the more precise but slower “fdiv” function to perform divides. For the faster but less precise divide operator ("/") use “-Mcuda=fastmath”.

Hope this helps,
Mat

Thanks, Mat