GPU is Slower than CPU!

Hello,

I have 1000 equations like the following one made in CPU by C++:

[codebox] h_F[j]=(delta[j]-hslip[j]) - pre_delta[j] - hpre_slip[j];[/codebox]

I made these equations on host and trensferred the vector h_F to GPU at each time-step. But, to reduce the CPU-GPU communication traffic, I want to make these 1000 equations completely in GPU by using CUBLAS. I wrote the following code to make the above equation:

[codebox] cublasScopy(N_detail, d_delta, 1, d_F1, 1);

cublasSaxpy(N_detail, -1, d_pre_delta, 1, d_F1, 1);//d_F1=delta-delta_pre

cublasScopy(N_detail, d_slip, 1, d_RESULT1, 1);

cublasSaxpy(N_detail, 1, d_pre_slip, 1, d_RESULT1, 1);//d_RESULT1=slip+slip_pre

cublasSaxpy(N_detail, -h, d_RESULT1, 1, d_F1, 1);[/codebox]

However, this code on GPU takes significantly more time to run in compare with its twin on the CPU!! I have a large set of data, so I expected that this code should be run faster than CPU. My question is why this takes more time than CPU? Do you have any suggestion to make this code on GPU more efficient?

I really need to reduce the computation time as much as possible, please let me know your ideas or suggestions.

Thanks.

The quick answer is that your problem is dominated by host-device memory transfers, not by math.
The overhead of merely sending the data to the GPU is more than the time the CPU takes to do the compute.

GPU computes win best when you have multiple, complex, math operations to perform on data, ideally leaving all the data on the device and not sending much back and forth to the CPU.

You may have a lot of data, but you don’t have much work for the GPU to do to that data.