i just update to newest version of pgcc and now seems it works, but still i’m having a problem with the execution
During the execution the program print a “Invalid handle” error
my code is this:
int sizeR = numRows1*numRows2;
#pragma omp parallel num_threads(2) private(result)
{
int th= omp_get_thread_num();
#if _OPENACC
acc_init(acc_device_nvidia);
acc_set_device_num(th+1,acc_device_nvidia);
#endif
fprintf(stdout,"THREAD(%d) - Launched thread.\n",th);
fprintf(stdout,"THREAD(%d) - Selected device: %d\n",th,acc_get_device_num(acc_device_nvidia));
int bI = th*(numRows1/2);
int eI = numRows1/((!th)+1);
fprintf(stdout,"THREAD(%d) - begin I: %d, end I: %d\n",th,bI,eI);
int bR = th*(sizeR/2);
int eR = (sizeR/((!th)+1));
fprintf(stdout,"THREAD(%d) - size R: %d, begin R: %d, end R: %d\n",th,sizeR,bR,eR);
result = &result[bR];
#pragma acc kernels copyin(m1[0:numRows1*numColumns1],m2[0:numRows2*numColumns2]), copyout(result[0:eR-bR])
{
int i = bI;
#pragma acc loop gang vector(256), independent
for (i=0;i<eI;i++)
{
int j;
#pragma acc loop gang vector(2) independent
for(j=0;j<numRows2;j++)
{
real_t acum = 0;
int k;
for(k=0;k<numColumns1;k++) {
acum += m1[i+k*numColumns1] * m2[j*numColumns2+k];
}
result[(i-bI)*numRows1+j] = acum;
}
}
}
}
I use a matriz size 5000x5000
and the output is this:
THREAD(0) - Launched thread.
THREAD(0) - Selected device: 1
THREAD(0) - begin I: 0, end I: 50
THREAD(0) - size R: 10000, begin R: 0, end R: 5000
THREAD(1) - Launched thread.
THREAD(1) - Selected device: 2
THREAD(1) - begin I: 50, end I: 100
THREAD(1) - size R: 10000, begin R: 5000, end R: 10000
call to cuLaunchKernel returned error 400: Invalid handle
call to cuMemFree returned error 700: Launch failed
Unfortunately, all this tells me is that the kernel failed for some reason. To narrow down the issued, can you try running with a single OpenMP thread? Also, try removing the schedule clauses, i.e the gang and vector and let the compiler schedule the loop.