After profiling a set of kernels which run concurrently on two GPUs, I took notice of the IPC count (computed through nvprof)
NOTE: need to scroll right to see the relevant values.
==5924== Metric result: ==5924== Metric result: Invocations Metric Name Metric Description Min Max Avg Device "GeForce GTX TITAN X (0)" Kernel: compress_y_half(float const *, __half2*, int) 1 ipc Executed IPC 1.243167 1.243167 1.243167 Kernel: sum_buffers_512(float4 const *, float4*) 1 ipc Executed IPC 0.137906 0.137906 0.137906 Kernel: simple_back_512(float const *, __half2 const *, float2*, float, float, int, int) 64 ipc Executed IPC 2.879317 3.224608 3.049561 Device "GeForce GTX TITAN X (1)" Kernel: compress_y_half(float const *, __half2*, int) 1 ipc Executed IPC 1.162051 1.162051 1.162051 Kernel: simple_back_512(float const *, __half2 const *, float2*, float, float, int, int) 60 ipc Executed IPC 2.807879 3.147826 2.962747
What operations are counted towards that value? Is it just floating point operations or do other integer operations also count?
So is this a truly relevane metric of performance?
If so are are such values for my primary workhorse kernel as good as they seem (3.04 average for the faster of the two GTX Titan X GPUs), or is there some other caveat to such simple conclusions?