nvcc FMAD detection doesn't seem to work... which syntax for fmad instruction ?

I had a code like that :

for(unsigned int j = 0; j < filterSize; j ++)
{
//Get filter operand
filterOp1 = filter[j];

            //Get RF dataOperands 
            RFOp1 = texfetch(texRFData, x,yBase+i+j);                
            
            res += filterOp1 * RFOp1;            

}

which I changed to :

for(unsigned int j = 0; j < filterSize; j += 2)
{

            //Get filter operand
            filterOp1 = filter[j]; 
            filterOp2 = filter[j+1];
            //Get RF dataOperands 
            RFOp1 = texfetch(texRFData, x,yBase+i+j);
            RFOp2 = texfetch(texRFData, x,yBase+i+j+1);
            
            res += filterOp1 * RFOp1 + filterOp2 * RFOp2;            
        }

In order to have a A * B + C * D like syntax, I thought nvcc would convert it to an FMAD instruction but had no perfermance gain…

Do you know how to do that ?

This is a MAD. (multiply filterOp and RFOp1 then add to res)

This is just two MADs.

I don’t see why you expect a performance increase from the first to the second based on the kind of instructions you expect the compiler to generate. The loop was unrolled so its possible that that might have caused some performance gain…

I don’t know at which stage of compilation nvcc trys to find mads, if it is before or after unrolling.

I know that for intrisics mads are made this way AB + CD, so I wanted to highlight it in the syntax.

A MAD is actually just A*B + C.

http://en.wikipedia.org/wiki/Multiply-accumulate

Ok thanks =)