Accelerator restriction: unsupported statement type: opcode=DEALLOC

Shine_X.M_Zhai · July 1, 2019, 8:47pm

Hello,

I am able to reproduce a compilation error in my C++ and OpenACC program.

#include <iostream>

class foo
{
public:
        foo(int nbEqs, int nOrder);
        ~foo();
        void createDeviceData();
        void calculation();
private:
        int _nbEqs;
        int _elem;
};

foo::foo(int nbEqs, int nOrder)
{
        _nbEqs = nbEqs;
        _elem = nOrder;
}

foo::~foo()
{
}

void foo::createDeviceData()
{
        #pragma acc enter data copyin(this)
}

void foo::calculation()
{
        #pragma acc parallel loop \
        copyin(_elem)
        for (int i=0; i<_nbEqs; i++)
        {
          double temp[_elem];
          for (int i=0; i<_elem; i++)
          {
            temp[i] = 1.;
          }
        }
}
int main()
{
        foo obj(100,3);
        obj.createDeviceData();

        obj.calculation();

        std::cout << "Done" << std::endl;
}

With “pgc++ -acc -ta=nvidia,lineinfo -Minfo=accel test3.C”, the compilation error reads:

"test3.C", line 36: warning: variable "temp" was set but never used
  	  double temp[_elem];
  	         ^

foo::createDeviceData():
     28, Generating enter data copyin(this[:1])
PGCC-S-0155-Accelerator region ignored; see -Minfo messages  (test3.C: 31)
foo::calculation():
     31, Accelerator region ignored
     34, Accelerator restriction: loop contains unsupported statement type
     40, Accelerator restriction: unsupported statement type: opcode=DEALLOC
PGCC/x86-64 Linux 19.4-0: compilation completed with severe errors

Could you advise why the error would arise?

Thanks,
Shine

MatColgrove · July 1, 2019, 9:04pm

Hi Shine,

VLAs aren’t supported in device code.

While you can change this to use malloc/free instead (See below), dynamic allocation on the device should be avoided if possible. Mallocs are serialized which can have adverse impact on performance. Also, the default heap size is 8MB which often leads to heap overflows if you malloc too much memory.

Instead, I’d recommend you make temp fixed size or declare it before the parallel loop and then add it to a private clause.

Hope this helps,
Mat

Malloc version:

void foo::calculation()
{
        #pragma acc parallel loop \
        copyin(_elem)
        for (int i=0; i<_nbEqs; i++)
        {
          double *temp = (double*) malloc(sizeof(double)*_elem);
          for (int i=0; i<_elem; i++)
          {
            temp[i] = 1.;
          }
          free(temp);
        }
}

Private version:

void foo::calculation()
{

        double temp [_elem];
        #pragma acc parallel loop \
        copyin(_elem) private(temp[0:_elem])
        for (int i=0; i<_nbEqs; i++)
        {
          for (int i=0; i<_elem; i++)
          {
            temp[i] = 1.;
          }
        }
}

Shine_X.M_Zhai · July 1, 2019, 9:42pm

Thanks, Mat, for your helpful replies as always. Yes, your solution works. Following your suggestions and from our tests, we will implement fixed size arrays to avoid exceeding the heap limit (we have iterated with Alex of PGI group on this).

Just out of curiosity, we have seen other types of VLA usages, for example:

#include <iostream>

class foo
{
public:
        foo(int nbEqs, int nOrder);
        ~foo();
        void createDeviceData();
        #pragma acc routine seq
        void calculation();
private:
        int _nbEqs;
        int _elem;
};

foo::foo(int nbEqs, int nOrder)
{
        _nbEqs = nbEqs;
        _elem = nOrder;
}

foo::~foo()
{
}

void foo::createDeviceData()
{
        #pragma acc enter data copyin(this)
}

void foo::calculation()
{
          double temp[_elem];
          for (int i=0; i<_elem; i++)
          {
            temp[i] = 1.;
          }
}

int main()
{
        foo obj(100,3);

        obj.createDeviceData();

        #pragma acc parallel loop
        for(int i=0; i<100; i++)
        {
          obj.calculation();
        }

        std::cout << "Done" << std::endl;
}

This code compiles and runs fine. I believe I am missing some subtle differences here regarding VLA usage. Could shed some light on this?

Thanks,
Shine

MatColgrove · July 1, 2019, 10:13pm

Ok, I was a bit too broad in saying VLAs are not supported in device code. In a device subroutine, you can use VLAs or Fortran automatics. I’ve just seen several folks have issues with heap overflows when using them, so don’t encourage their use.

Shine_X.M_Zhai · July 1, 2019, 10:29pm

Ah, I see the difference. Yes, my latest example is a device function while the original example is a plain accelerated OpenACC region. We will avoid VLAs in our upcoming development.

Is there a schedule for the next community release? Currently there is a strong indication in our production code that private arrays are actually not privatized. However I have not been able to reproduce it with a smaller code. I wonder whether this has already been fixed by TPR#27025 (Release 2019 Bug Fixes and Enhancements | PGI) so that the next community release will automatically incorporate the fix.

Thanks,
Shine

MatColgrove · July 2, 2019, 2:41pm

Hi Shine,

We typically release the community edition at the xx.4 and xx.10 releases, so the next one should be 19.10.

TPR#27025, which was fixed in 19.5, had to do with putting a private array on a on worker loop. Are you also using a worker loop? If not, your issue may be something different. If you can get me a reproducing example, I can see what’s wrong and determine if it the same issue or something else.

-Mat

Shine_X.M_Zhai · July 2, 2019, 3:33pm

Hi Mat,

Thanks for your information. As to my bug, no I did not use a worker loop so it is not likely related to TPR#27025. I will spend some time on a reproducer again today, and if I can get it to work I will report to PGI customer service.

Thanks and have a good day,
Shine

Topic		Replies	Views
How should OpenACC handle array size that is only known at run-time Legacy PGI Compilers	8	4106	June 20, 2019
compilation error: unsupported statement type: opcode=JSRA Legacy PGI Compilers	2	2406	April 25, 2019
Accelerator restriction: invalid loop Legacy PGI Compilers	5	6406	September 26, 2017
Accelerator restriction: unsupported call to support routine 'memcmp' Legacy PGI Compilers	1	2411	April 30, 2019
Compiling with C++ stdlib Procedures Legacy PGI Compilers	7	9723	January 7, 2015
compiler ask acc routine information for internal function Legacy PGI Compilers	12	20310	October 25, 2017
OpenACC supports type-bound procedures in C++? Legacy PGI Compilers	5	3308	August 25, 2018
Accelerator Fatal Error: No NVIDIA/CUDA version... Legacy PGI Compilers	12	14648	May 15, 2017
Accelerator restriction: unsupported call to ... Legacy PGI Compilers	6	9366	January 30, 2013
function/procedure calls not supported Legacy PGI Compilers	5	7467	March 2, 2012

Accelerator restriction: unsupported statement type: opcode=DEALLOC

Related topics