If I want to benefit from Kepler GK110’s Hyper-Q mechanism, i.e., to make two streams be put into two different hardware work queues to avoid some false dependencies, is it necessary for me to create the two streams with two CPU threads or the process forementioned will be undertaken automatically by CUDA driver or something else like CUDA work distributor?
Or by what means I can testify the consideration above?