PGCC-S-0000-Internal compiler error. Call in OpenACC region

mn2693 · May 7, 2015, 8:13pm

Hi,

when adding a ‘#pragma acc routine seq’ to a templated class’ constructor, the following error occurs in a different source file:

PGCC-S-0000-Internal compiler error. Call in OpenACC region to support routine - strlen (someSourceFile.cpp: 259)
PGCC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages)  (someSourceFile.cpp: 259)
std::char_traits<char>::length(const char *):
      5, include "someSourceFile.h"
           5, include "map"
               61, include "stl_map.h"
                    63, include "tuple"
                         39, include "array"
                              38, include "stdexcept"
                                   39, include "string"
                                        40, include "char_traits.h"
                                            259, Generating implicit acc routine seq
                                                 Accelerator restriction: unsupported call to support routine 'strlen'

Why does the compiler add the implicit acc routine statement at this point?

I’m using the pgc++ compiler version 15.4.

Thank you
Marco

MatColgrove · May 7, 2015, 11:31pm

Hi Marco,

Why does the compiler add the implicit acc routine statement at this point?

All subroutine calls made from within an OpenACC compute region must have a device version available. This means that the OpenACC “routine” directive must be added to all of these subroutines so the compiler knows to generate the device version.

In C++, adding “routine” to all subroutines would be very laborious. Also, there may be cases where the user doesn’t have access to the subroutine’s definition so can’t explicitly add “routine”. Hence, pgc++ will implicitly add “routine” to all visible subroutines called from within a compute region (as well as all routines that they call).

In this case, it looks like you have a call to “std::char_traits::length” which in turn calls “strlen”. Since strlen’s source isn’t visible to the compiler, it can’t automatically create the device code.

Mat

mn2693 · May 8, 2015, 8:48am

I understand this, but as far as I know, does the constructor, where the ‘routine’ was added, not call something in someSourceFile.cpp/h, where the compile error is reported. So I don’t understand, how the compiler thinks, that the call in line 259 of someSourceFile.cpp has to be on the GPU.

I just tried to remove or line or move it to another position, but the errors occrus in the same line, where now stands a comment. How can this happen?

Is there workaround to run strlen on the GPU? How can I solve this problem?

Marco

MatColgrove · May 8, 2015, 3:42pm

Hi Marco,

I just tried to remove or line or move it to another position, but the errors occrus in the same line, where now stands a comment. How can this happen?

My guess is that this isn’t the line where it’s coming from. The compiler wouldn’t try adding it if it wasn’t called. Though, without a reproducing example I can’t tell for sure.

Can you either post or send to PGI Customer Service (trs@pgroup.com) an example?

Thanks,
Mat

mn2693 · May 11, 2015, 2:29pm

Hi Mat,

posting or sending code (or an example) is complicated, since I work in a large code from the university for my bachelor thesis.

Maybe I can explain the problem in a different way, I have code looking like this:

#pragma acc parrallel loop gang ...
for (int i=0; i<n; i++)
{
  // some Code
  SomeClass x();
  // some more code
  #pragma acc loop verctor(128) collapse(3)
  for (int j=0; j<n2; j++)
  {
    // some more loops, x used inside here
  }
} // end of this block, so x will be destructed

To realize this, the constructor of SomeClass has to offloaded to the GPU, adding

#pragma acc routine seq

results in the error described above at a completly different position in the code.

As a workaround I created a new constructor taking one more argument. This new constructor does the same, but runs on the GPU and will only be called within GPU regions. This looks like this:

class SomeClass {
  SomeClass();
  #pragma acc routine seq
  SomeClass(bool gpu);
}

This compiles, but now the destructor makes similar problems (EDIT: was ‘same’ before). The workaround I used for constructor doesm’t work, since the destructor can’t take arguments.

What is the right way to use C++ classes in OpenACC paralle regions?

I hope this describes my problem better.

Marco

MatColgrove · May 11, 2015, 6:09pm

Hi Marco,

I tend to avoid having classes instantiated within a compute region. While it can be done, besides the issues you’re encountering where the constructor/destructor are more complex in structure, most likely calls to “new/delete” will be made. Dynamic memory allocation from device code can be costly to performance.

How about allocating an array of “SomeClass” on the host, one for each “n”, so you don’t need to execute the constructor/destructor on the device?

Mat

mn2693 · May 13, 2015, 12:56pm

Hi Mat,

The class contructed in the parallel region provides mulitdimensional tensor access to an flat 1D array. So the operator () is overloaded. What is the right way to use this operator in the parallel region? Do I need to create an array of these tensors on the host and then copy this tensor array to the GPU and then the data array for each tensor?

Sometimes I get the error ‘Unsupported union data type’. I posted an question for this in this forum some days ago, but I got no answer. Can you tell me what this error means/how I can solve this problem (or even answer to the post: Unsupported union data type)?

Thank you
Marco

P.S.
Jiri Kraus advised me to mention, that I work on the ZFS (RWTH Aachen University), so you know which code i’m talking about.

MatColgrove · May 13, 2015, 3:55pm

What is the right way to use this operator in the parallel region?

I hesitate to say what is the “right way” because it is situational. However in general I advise instantiating classes on the host.

Do I need to create an array of these tensors on the host and then copy this tensor array to the GPU and then the data array for each tensor?

Given what you describe, my suggestion would be to create an array of these tensors on the host. I would then use the CUDA Unified Memory Beta features (-ta=tesla:manged) of the 15.x compilers to have the CUDA runtime manage the dynamic memory. This will get you running quicker at the cost of some performance. Once working, you can then go back and optimize the data movement. Note that the UM beta is currently a separate download package.

Sometimes I get the error ‘Unsupported union data type’. I posted an question for this in this forum some days ago, but I got no answer. Can you tell me what this error means/how I can solve this problem (or even answer to the post:

Apologies for not getting you an answer on this one. Basically it’s a generic compiler error where it’s generating bad GPU code. However without a reproducing example I can’t tell what’s causing it nor how to work around it.

Jiri Kraus advised me to mention, that I work on the ZFS (RWTH Aachen University), so you know which code i’m talking about.

Yes, he let me know.

Mat

Topic		Replies	Views
Accelerator restriction: unsupported call to support routine 'memcmp' Legacy PGI Compilers	1	2411	April 30, 2019
Compiling with C++ stdlib Procedures Legacy PGI Compilers	7	9725	January 7, 2015
compiler ask acc routine information for internal function Legacy PGI Compilers	12	20312	October 25, 2017
PGCC-S-1000-Call in OpenACC region to procedure '__cxa_vec_c Legacy PGI Compilers	2	4844	March 6, 2017
Problems with using routine Legacy PGI Compilers	3	3050	July 30, 2015
OpenACC routine call inside OpenMP parallel loop Legacy PGI Compilers	7	1142	October 12, 2021
PGCC-S-0155-Compiler failed to translate accelerator region Legacy PGI Compilers	4	6956	November 25, 2015
function/procedure calls not supported Legacy PGI Compilers	5	7467	March 2, 2012
PGCC-S-1000-Call in OpenACC region to procedure 'memmove' which has no acc routine information Legacy PGI Compilers	1	1361	August 1, 2019
Internal compiler error. unsupported procedure Legacy PGI Compilers	9	5961	February 7, 2011

PGCC-S-0000-Internal compiler error. Call in OpenACC region

Related topics