going on with my arithmetic benchmarking on a FERMI CPU I noticed that perforing a certain amount of multiplications using integer arithmetic, let us say 100 000 multiplications, it takes a certain time T
If I execute 100 000 integer mulitplications and 100 000 FPU multiplications the execution time does not double and I have something far less than 2T
So I was wondering if I can take advantage in my applications (that involve multi precsions high performance arithmetic) of using both FPU and ALU
to perform multiplication, since it seems that I can exploit a certain level of parallelism.
P.S. I measured that the same does not happen for the additions