About weird performance of multiple GPUs

I am testing performance with two c1060s, but I can’t understand the result.
I used the test program ‘simpleMultipleGPUs’ in SDK, but the performance was worse in multple devices than single device.

There were several posts about multiple gpus and it wrote the bad performance was due to thread overhead.
But, I don’t think so. It looks like serialized execution even if it uses multiple GPUs.
Also, I tried it with my own test program, but it shows the same situation.

  • linux
  • driver version: 177.70 (for 1070 only)

If there’s one who has experience on multiple GPUs, please tell me how to test it.