Disable Fused Multiply-Add(FMA) with Numba

Hello everyone, I am a newer of the CUDA programming. And I want to compare the performance of my algorithms implemented on CPU and GPU, respectively.
My programming language is Python and now my algorithm can work normally on CPU and GPU, but the computing results are different on these two architectures. I think the reason may be the ‘Fused Multiply-Add(FMA)’ which existed in GPU but not supported by CPU. So I want to disable this operation on GPU, then to verify if my inference is right or not.
On CPU, I write my code with pure Python and on GPU I write it with the tool of Numba. My system is ubuntu 14.04, NVIDIA graphic card is Tesla K40c(the compute capability is 3.5).
Anyone has ideas?

The CUDA compiler lets you turn off the contraction of floating-point multiply followed by dependent floating-point add into FMA by passing the switch -fmad=false to nvcc.

I do not know whether or how you can set this compiler switch in the Numba environment, you may want to check the relevant documentation or dig through the sources (I think it’s an open-spource project?)

Note that turning off the generation of FMAs will likely have a negative impact on both the performance and the accuracy of floating-point computation.

Thanks njuffa,
yes I know there may be some methods to achieve what I want with the CUDA toolkit, but until now I have not find the right thing in Numba.
I am also clear that the FMA is good for my algorithm, I just want to find the reason that cause the different results.
Thank you all the same.

a simple google search on “numba fma” turned up this page:


which may be of interest

This seems to be the relevant section from txbob’s link:

That would seem to refer to online-compilation, though?

Thanks njuffa and txbob,
That may be the right thing I am finding but unfortunately I do not know how to use it. I am also a new learner of Python. Now I just have a .py file of my algorithm and no other output files or other things. I do not know how to go on.
Can you give me some hints?

I last used Python 15 years ago and don’t know Numba. You are likely to get a quicker / better answer on how to control Numba compilation by asking on a forum or mailing list dedicated to that product, so that is what I would recommend.

hi njuffa,
I am sorry for my fault, the different results of my algorithm was caused by my misunderstanding of the ‘row’-major and ‘column’-major array storage, that is different in Python and CUDA. Then after I correct it the results go back the same. haha.
Thanks everyone.