Hello,
what is faster and why?
Uses the kernel function the default stream 0?
Then it is equals?
Hello,
what is faster and why?
Uses the kernel function the default stream 0?
Then it is equals?
Unless the compiler recognizes powf(x,2) as a special form, xx should be MUCH faster - it will compile to a single MUL, which is 2 clocks. powf(x,2) compiles to the exp2f(2log2f(x)), so those two are identical - but evaluate to three trips through the SFU, I think, (log2f, pre.ex2, ex2) plus a MUL.
Thank you!!!
What is with the streams?
If you dont specify a stream for a kernel it will use the 0-stream by default. Your 2nd stream-example will use the 0-stream.
With another words, I must specify a stream for the kernel, because the execution of 2nd example is not concurrent?
Im not quite sure if e.g. a kernel in 0-stream can overlap with a memcpy in an explicitly created one. But as it says you need different streams for overlapping I think it should work. Just try it out.
Thank you!!!