How to do the top level of parallization by GPU?

Dear all,

I am a beginner of GPU program. I want to know whether it is possible to do the top level of parallization by GPU.

for the following example code:

for(int i=0; i<N; i++)
{
ZGEMM(&trans, &trans, &mm, &nn, &kk,&alpha, A[i], &lda, B[i], &ldb, &beta, C[i], &ldc);
}

the current parallization method is:

for(int i=0; i<N; i++)
{
cublasZgemm(&trans, &trans, &mm, &nn, &kk,&alpha, A[i], &lda, B[i], &ldb, &beta, C[i], &ldc);
}

However, I wish the ZGEMM functions unchanged (for example, still use the MKL’s library) but parallize the program from the level of “for” loop. Do you think it is possible or not? How to modify the code?

Thanks,
Zhanghong Tang