I’m writing kernels for pytorch and I want to disable fused multiply-add for some reasons. I added ‘–fmad=false’ according to the official tutorial. However, when I checked SASS assembly of generated pyd by cuobjdump to make sure everything correct, I found only part of FFMA instructions replaced.
That’s wired. Did I do something wrong or is that the expected behavior?
I’m using win10, pytorch1.7 + cuda11.0, gencode=arch=compute_61,code=sm_61