How does the GK110's Hyper-Q enable concurrency of multiple streams?

If I want to benefit from Kepler GK110’s Hyper-Q mechanism, i.e., to make two streams be put into two different hardware work queues to avoid some false dependencies, is it necessary for me to create the two streams with two CPU threads or the process forementioned will be undertaken automatically by CUDA driver or something else like CUDA work distributor?

Or by what means I can testify the consideration above?

Have you had a chance to look at the simpleHyperQ sample app that ships with CUDA? You can find relevant documentation here:

[url]http://docs.nvidia.com/cuda/samples/6_Advanced/simpleHyperQ/doc/HyperQ.pdf[/url]

That is exactly the answer and method I need. Thanks, njuffa!