I am very new to Open ACC, and still trying to learn exactly how everything works.
I made an application that leverages the GMP BigNum Library, and I am trying to convert it to also leverage the computing power of my nvidia card.
I am able to get simple applications to work, but when I try to parallelize loops that contain references to the gmp datatypes or function calls, I am unable to get said loop to parallelize.
One error that often comes up, that I can’t seem to make sense of is:
Loop not vectorized/parallelized: contains call
What exactly does this mean?
Is there something fundamental about the way Open ACC works that I am missing when trying to accomplish my task? Should I be thinking about what I am trying to do from a different perspective? Is it possibly just not so simple to start wrapping my loops with the appropriate pragmas?[/b]
The problem here is that there isn’t a device version for the called routine. You will need to either inline these routines or have the compiler create device callable versions using the OpenACC “routine” directive.
This article gives a good overview of the “routine” directive: Account Login | PGI
Let us know if you need additional help or clarification.
inline means substitute function call with function’s body. You can do in by hands or ask compiler. Compile will try inlining if you pass additional command line options. For example,
pgfortran test.f90 -Minline=foo -Minfo=inline
foo - function name you want to inline -Minfo=inline - display information on inlining
On the subject of that then, how well does inlining respond to classes and class methods? If I were to try and inline a class method or a method that is the result of an operator overload, how might one do that? Are there are any examples available? I am using C++.