powf and streams on Fermi


what is faster and why?

Uses the kernel function the default stream 0?

Then it is equals?

Unless the compiler recognizes powf(x,2) as a special form, xx should be MUCH faster - it will compile to a single MUL, which is 2 clocks. powf(x,2) compiles to the exp2f(2log2f(x)), so those two are identical - but evaluate to three trips through the SFU, I think, (log2f, pre.ex2, ex2) plus a MUL.

Thank you!!!
What is with the streams?

If you dont specify a stream for a kernel it will use the 0-stream by default. Your 2nd stream-example will use the 0-stream.

With another words, I must specify a stream for the kernel, because the execution of 2nd example is not concurrent?

Im not quite sure if e.g. a kernel in 0-stream can overlap with a memcpy in an explicitly created one. But as it says you need different streams for overlapping I think it should work. Just try it out.

Thank you!!!