Christen CSRD, 26(3):205-210 doi
hints that the ratio of data moved to calculation performed on it should
be 4 - 64 FLOP per data item. I have a problem where the serial algorithm
is less than 1 and these are integer rather than float data items.
Do people use Arithmetic intensity?
What values are you seeing?
Does the range 4 to 64 FLOP/TDE make sense to you?
How low can the ratio be and it still make sense to use a GPU?
Does integer v. float make any difference?
All comments and data welcome