Boosting performance by not using Default stream

Overview:
We are profiling our system that is built on top of deepstream SDK, our pipeline consist from deepstream, gstreamer and eyezon plugins and looks as the following:
It seems that some of the plugins below delivered with deepstream used the default cuda stream, thus affecting other threads and limiting performance.

Question:
Is there any way to use these plugins not on the default stream?

Details of our pipeline:

/root/shop/third_party/pylon_gstreamer/Samples/demopylongstreamer/demopylongstreamer -camera 107 -framerate 10 -rotate 180 -set_pts_micro -parse “queue ! nvvideoconvert gpu-id=1 ! capsfilter caps=“video/x-raw(memory:NVMM), format=RGBA” ! .sink_0 nvstreammux live-source=1 width=2064 height=2064 batch-size=1 gpu-id=1 nvbuf-memory-type=0 ! nvvideoconvert gpu-id=1 ! queue ! yoloplugin gpu-id=1 camera-name=107 ! fakesink”

inside demopylongstreamer there is sub-pipeline consisted of:
camera_source, queue, converter, rescaler, rescalerCaps, rotator, finalConverter, finalFilter

DeepStream does not use default cuda stream, I check the code like the below.

gstnvinfer.cpp line 841
cudaReturn =
cudaStreamCreateWithFlags (&nvinfer->convertStream,
cudaStreamNonBlocking);

nvdsinfer_context_impl.cpp
line 528:
cudaReturn = cudaStreamCreateWithFlags(&m_PreProcessStream,
cudaStreamNonBlocking);
line 540:
cudaReturn = cudaStreamCreateWithFlags(&m_InferStream, cudaStreamNonBlocking);
line 552
cudaReturn = cudaStreamCreateWithFlags (&m_BufferCopyStream,
cudaStreamNonBlocking);

gstnvstreammux.c
line 2151: cudaStreamCreate (&mux->stream);
line 655 : cudaStreamCreate (&mux->nppStream);

gstnvvideoconvert.c
line 2296: cudaStreamCreateWithFlags(&(space->config_params.cuda_stream), cudaStreamNonBlocking);

Thanks,
I will look on other places in my code