Does CPU processing performance low after GPU processing(CUDA) at TK1?

I’m using TK1 board for image processing.
I made 2 versions of code, one for only CPU(SGM algorithm) + CPU(ROI algorithm) code, other one for GPU(SGM algorithm) + CPU(ROI algorithm) code by CUDA.

Actually, CPU(ROI algorithm) code is same on 2 versions.
But processing time is different. GPU + CPU version is 2 times slower.
When I tested on PC, 2 versions of processing time is the same at Linux and Windows.

So, I can geuss,
TK1 board need some time after CUDA processing for complete performance of CPU.

Is it right?

Try the tips here
http://elinux.org/Jetson/Performance

I think, you’re so genius.
It’s perfectly success.

Additionally, This options are given that Whole processing time reduce about half.

Thank you so much.
If you live in korea, I want to buy dinner for you.^^