Multiple GPUs error cuLaunchKernel 400

I am testing the example for multiple GPUs on page 16 from PGI Accel. Compilers OpenACC Getting Started Guide v.13.2. Using 1 OpenMP thread and 1 GPU works fine, but for 2 GPUs with 2 OpenMP threads I get the error message cuLaunchKernel 400: Invalid handle. The machine has 2 M2070 GPUs, the latest nVidia driver for OracleLinux 6.3 and 13.2 PGI CDK.

i have the same problem :S

Sorry about that. This is a known issue that has been fixed in the 13.3 release (just released yesterday March 12th on Linux, Windows to follow here shortly).

% setenv OMP_NUM_THREADS 2
% pgf90 -acc multi.f90 -mp -lacml -V13.2 ; a.out
 Host Serial    2489.612915315794     
call to cuLaunchKernel returned error 400: Invalid handle
call to cuMemcpyDtoHAsync returned error 4: Deinitialized
%
% pgf90 -acc multi.f90 -mp -lacml -V13.3 ; a.out
 Host Serial    2489.612915315794     
 Multi-Device Parallel    2489.612915315794
  • Mat

Hi Mat, as you told us, this issue is solved in ver 13.3 for our code too.

Thank you,
Manolis

Hi, I have the same problem using pgi 14.3 on my windows machine. This machine has 4 GeForce GTX780Ti. Using 1 OpenMP thread for 1 GPU works fine, but when i am trying to use 2 OpenMP threads each for one GPU I get this error. Here is the code snippet:

#pragma omp parallel num_threads(2)
	{
		int i, j, k;
		int id, blocks, start, end;
		id = omp_get_thread_num();
		blocks = n/threads;
		start = id*blocks;
		end = (id+1)*blocks;
		acc_set_device_num(id+2, acc_device_nvidia);

		printf("copying %d\n", id);
		#pragma acc data copyin(aa[start*n:blocks*n])\
						 copyin(bb[0:n*n])\
						 copyout(cc[start*n:blocks*n])
		{
		
		printf("kernel %d\n", id);
			#pragma acc kernels loop collapse(2) private(j,k)
			for(i=start; i<end; i++)
				for(j=0; j<n; j++)
				{
					float c = 0.0f;
					for(k=0; k<n; k++)
						c += aa[i*n+k] * bb[k*n+j];
					cc[i*n+j] = c;
				}
		}
		
		printf("after kernel %d\n", id);
	}

And the output:

copying 0
copying 1
kernel 1
kernel 0
call to cuLaunchKernel returned error 400: Invalid handle

My compilation command:

pgcc -acc -mp -V14.3 -Minfo=accel -fast multi.c

Thanks

Thanks miki_zizou. I’ve recreated the error here, filed a problem report (TPR#20174), and sent it off to engineer for further investigation.

Best Regards,
Mat

Is there an solution available?

I encountered this Problem in PGI 17.5 too.

Hi IngoSchulz85971,

TPR#20174 was fixed awhile ago and I doubled checked that the error miki_zizou was seeing does not occur with 17.5. Hence while the error may be the same, it’s cause is different.

Can you please post or send to PGI Customer Service (trs@pgroup.com) a reproducing example as well as more information about your environment, such as OS, target devices, compilation flags, and details on how to run the program.

Thanks,
Mat

Hi, I sent a reproducing example and I am now waiting for response :)