Hello,
I have 1000 equations like the following one made in CPU by C++:
[codebox] h_F[j]=(delta[j]-hslip[j]) - pre_delta[j] - hpre_slip[j];[/codebox]
I made these equations on host and trensferred the vector h_F to GPU at each time-step. But, to reduce the CPU-GPU communication traffic, I want to make these 1000 equations completely in GPU by using CUBLAS. I wrote the following code to make the above equation:
[codebox] cublasScopy(N_detail, d_delta, 1, d_F1, 1);
cublasSaxpy(N_detail, -1, d_pre_delta, 1, d_F1, 1);//d_F1=delta-delta_pre
cublasScopy(N_detail, d_slip, 1, d_RESULT1, 1);
cublasSaxpy(N_detail, 1, d_pre_slip, 1, d_RESULT1, 1);//d_RESULT1=slip+slip_pre
cublasSaxpy(N_detail, -h, d_RESULT1, 1, d_F1, 1);[/codebox]
However, this code on GPU takes significantly more time to run in compare with its twin on the CPU!! I have a large set of data, so I expected that this code should be run faster than CPU. My question is why this takes more time than CPU? Do you have any suggestion to make this code on GPU more efficient?
I really need to reduce the computation time as much as possible, please let me know your ideas or suggestions.
Thanks.