I have several arrays of size N. I will be computing the point-wise average (mean) of a predetermined number of these, M. So I have M arrays of size N that I need to average together to produce one array of size N.

I know that division is considered very costly to perform on the GPU. I was thinking that the best approach may be to multiply each element in the M arrays by (1/M), and then doing simple additions to compute averages. Would I be better off doing this 1/M multiplication on the CPU before moving the arrays to the GPU for averaging? Does a function exist for the GPU to perform division by scalar optimally?