MultiGPU do not accelerate program

This is my test result in 4 1080Ti:

This is the test result in 1 1080Ti:

I just change my batchsize from 32 to 128, the dataset and model are same.
why the time is 300ms/batch in 4 1080Ti instead of 80ms.
When I increase the number of GPUs to 4, the amount of data also increases by 4 times. Shouldn’t the time be close?
Can someone help me?

The guys at pyCUDA/pyTorch forums will probably provide better information on what python is doing.