Why Multi-GPU slower than single GPU?

When I train neural network using cublas in Ubuntu 10.10 with CUDA4.0,
I find it strange that using only one GPU(only one GTX590) is faster
than using double GPUs(two GTX590 in one PC)with the same configuration.
Why using two GPUs with more cores is beaten by single GPU?


It’s depends on your implementation. Could you provide us with your code.

See also http://forums.nvidia.com/index.php?showtopic=197764

Thanks for your reply.My code is very long,but the major part is as following.

void GPU_forward_bunch(size_t frames_this_bunch, QN_MLP_BunchFl3 *mlp)
///first layer
cublasSgemm(‘T’,‘N’,frames_this_bunch,mlp->n_hidden,mlp->n_input +1,1.0f,d_input,mlp->n_input +1,d_in2hid, mlp->n_input +1,0.0f,d_hidden,frames_this_bunch);

int grid_size = (frames_this_bunch * (1+mlp->n_hidden))/256 +1;
GPU_sigmoid<<<grid_size , 256>>>(d_hidden ,frames_this_bunch * mlp->n_hidden,frames_this_bunch * (mlp->n_hidden+1));

///second layer
cublasSgemm('N','N',frames_this_bunch,mlp->n_output,mlp->n_hidden +1,1.0f,d_hidden,frames_this_bunch,d_hid2out,mlp->n_hidden +1,0.0f,d_output,frames_this_bunch);

grid_size = frames_this_bunch/256 +1;
GPU_softmax<<<grid_size , 256>>>(d_output,mlp->n_output,frames_this_bunch);