Drop in performance while running 2 CUDA application in parallel

I have 2 GPU application(one 3rd party application with no access to source code). On sequential execution the time total runtime for one cycle is 80-83 milli seconds(with max GPU utilization of 40-45%), Inorder to run 2 application in parallel I have added a cudastream in my code and call the 2 application in parallel using thread(openmp). The expected result was the completion of both application within 50 milli seconds but the observed behavior was completion of one cycle in 60 - 65 milli seconds with max GPU utilization of 70%.
What could be a possible solution to the problem inorder to run both application in parallel within 50 milli seconds?