As I try everything for that last bit of performance, I was wondering, is there a way to use mul24 with CUDA Fortran? The -Mcuda help print doesn’t seem to have it as a viable option (whereas it does exist for -ta=nvidia).
I have no idea if it’ll help, but I have to try!
Hi Matt,
Sorry, no mul24 sub-option for -Mcuda. In CUDA C, the programmer needs to explicitly call “__mul24(x,y)” to get the t 24-bit multiply. We might be able to add this as a callable routine, but it’s my understanding that 24-bit multiplies are slower than the default on a Fermi so it would have a short shelf life in terms of usefulness.
Important?
Mat,
No, it’s probably not that important–there aren’t many integer multiplies–but I thought I’d ask.
Thanks,
Matt