I’ve been delving into OpenCL and I’m wondering if there is a way to find the theoretical FLOPS of the OpenCL devices on a host. When you’re deploying code, you would want it to run on the fastest device present and be able to choose this dynamically based on the user’s hardware.
I was unable to find a forum post about this.
I thought I would be able to calculate FLOPS based on information that can be drawn from the clGetDeviceInfo function. Unfortunetly requesting the number of compute units does not give you the number of processors of the device (in all cases). In nVidia’s case the number of compute units is the number of streaming multi-processors (not stream processors like I would hope).
I was hoping to use an equation like this to calculate FLOPS
FLOPS = ClockRate * ALUs * FLOOpsPerClockCycle.
But without a definite number for the number of ALUs in a compute device, doing it this way becomes less possible. I might beable to take the number of Compute Units and multiply it by 8 but then I would need to introduce a case statement for Fermi Devices and multiply it by 32 in that case. But then I’m not even taking into account ATI devices (which some people may use) at that point the determination metrics become very un-elagant and unable to be used if there are architecture changes in the future.
Has anyone found a good way to do this, how would you go about choosing a device to attach a context, Command Queue, kernel etc. to?