Error when program reaches GPU code

venetis · September 9, 2020, 9:02am

Hello everyone,

As I managed to overcome the issue I had earlier (see OpenACC "declare link" with routine called in target region - #2 by venetis) I have backported the solution to the larger Fortran program.

The code compiles fine with version 19.10 Community Edition, Execution of the program starts, but when it reaches the code that has to be executed on the GPU I get now the following error:

When compiling with CUDA 9.2 (pgfortran isola15.f90 common_vars.f90 parameters.f90 -O4 -acc -ta=tesla,cc35 -Minfo=accel -Mcuda=cuda9.2 -o isola15c)

line 325: cudaLaunchKernel returned status 11: invalid argument

When compiling with CUDA 10.0 (pgfortran isola15.f90 common_vars.f90 parameters.f90 -O4 -acc -ta=tesla,cc35 -Minfo=accel -Mcuda=cuda10.0 -o isola15c)

line 325: cudaLaunchKernel returned status 11: invalid argument

When compiling with CUDA 10.1 (pgfortran isola15.f90 common_vars.f90 parameters.f90 -O4 -acc -ta=tesla,cc35 -Minfo=accel -Mcuda=cuda10.1 -o isola15c)

line 325: cudaLaunchKernel returned status 1: invalid argument

I am not certain how I can further debug this and what I can do, as the kernel and the arguments passed to it are generated by the compiler.

It is also weird that the test program in my other post works now without an issue, but applying the same solution to the larger program does not.

Please help!

MatColgrove · September 10, 2020, 3:38pm

Hi Ioannis,

I tried replicating the error using the code you previously posted but it compiled and ran correctly for me (though my K80 system uses CUDA 10.2). Have you made any additional changes?

My best guess is that you’ve set either “num_gangs”, “num_workers” and/or “vector_length” to large or incompatible values for your device. I’ve seen a similar error before when this occurred: cudaLaunchKernel returned status 1: invalid argument

-Mat

venetis · September 10, 2020, 3:58pm

Hello Mat,

As I mentioned, the test code works. The original (large) Fortran code had the problem after backporting the solution. I figured out just a couple of hours what the problem was. One of the subroutines called within the target region was declaring a very large local array (200000 reals) per thread, which is not supported on the GPU. Thankfully for the problem under consideration the array can be much smaller (~10000 reals are enough). The error message was quite misleading though.

After correcting the above, one last error I was getting in the parallel region was a misaligned memory address access. That was easier to find. In one of the subroutines a CHARACTER*3 array is declared and passed from one subroutine to others. The compiler didn’t like the fact that there were 3 characters (bytes) per element in the array. I changed all these declarations to CHARACTER*4 and that solved it. Maybe some automatic padding should be applied in such cases from the compiler?

I am getting finally the correct results from the GPU! But the program crashes a bit later on the CPU (although the serial version of the code, ignoring the OpenACC directives, works correctly). Hopefully I will figure this out too.

Ioannis

PS: In the meantime I have also switched to the NVidia HPC SDK 20.7 and nvfortran. My understanding is that this compiler is based on pgfortran, right?

MatColgrove · September 10, 2020, 5:39pm

In the meantime I have also switched to the NVidia HPC SDK 20.7 and nvfortran. My understanding is that this compiler is based on pgfortran, right?

Correct, nvfortan is just the re-branded and updated pgfortran.

Topic		Replies	Views
Starting Accel. Fortran Legacy PGI Compilers	2	3691	February 17, 2011
Signal 11 when compiling for profiling Legacy PGI Compilers	15	15773	September 10, 2015
Cuda fortran doesnt launch subroutines containing gpu code Legacy PGI Compilers	3	2454	May 26, 2018
CUDA FORTRAN examples don't work for PGI19.4 Legacy PGI Compilers	7	3822	May 8, 2019
Strange error Legacy PGI Compilers	3	967	July 6, 2020
cuda fortran does not work on my computer now Legacy PGI Compilers	6	1764	December 3, 2019
Fortran compilation problem. Legacy PGI Compilers	1	8962	March 18, 2010
cudaLaunchKernel returned status 98: invalid device function nvc, nvc++ and nvfortran	15	4453	January 31, 2023
program does not work with PGI194+cuda10.1 under Ubuntu 18.04 Legacy PGI Compilers	4	1092	September 28, 2019
call to cuLaunchKernel returned error 400: Invalid handle Legacy PGI Compilers	2	4424	May 13, 2019

Error when program reaches GPU code

Related topics