Inference with two engines

At this moment I have difficulty to combine two engines. So I have first engine and its output will be input of second engine.
I can get output to cpu memory and send back to device memory for second engine.
I like to save transfer time, so is it possible to infer with two engines? Any samples?