i see little difference; except, i would caution against destroying streams before all work in them are known to be completed
i would also caution against redundant streams
you already use openMP, which implies different cuda contexts, i would think, which in turn implies different (default) streams
and your profiling output seems to support this point
only if it is clear that the tasks issued by openMP threads run in the same stream, or if you can concurrently run tasks issued by individual openMP threads, may you possibly benefit from streams