Invalid cast opcode for typedef variable

vitduck · May 2, 2023, 6:55am

Hello,

I’ve tried to compile to following code using PGI compiler from SDK/22.7
It seems that PGI is trying to implicitly convert user-defined type to pointer type.
The relevant definition is:

typedef struct complextype
{
	float real, imag;
} Compl;

Which produces the following error message:

$ pgcc -mp -acc -Minfo=accel  mandelbrot_openacc5.c 

main:
     48, Generating create(res[:][:]) [if not already present]
     54, Generating implicit private(j)
         Generating NVIDIA GPU code
         54, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
         55, #pragma acc loop seq
     54, Local memory used for z,c
     55, Complex loop carried dependence of __nv_main_F1L44_2_3->->,res prevents parallelization
         Loop carried scalar dependence for j at line 59,72,73,75
         Generating implicit private(j)
         Complex loop carried dependence of __nv_main_F1L44_2_3->-> prevents parallelization
     64, Loop carried scalar dependence for k at line 68
         Scalar last value needed after loop for k at line 72
     79, Generating update self(res[start:block_size][:])
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccXDWmtiOa2yOg.gpu (123, 23): parse invalid cast opcode for cast from '%struct.DT1_284 = type <{ float, float }>' to 'float*'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (mandelbrot_openacc5.c: 1)
NVC++/x86-64 Linux 22.7-0: compilation aborted

I do not observe this issue with PGI/19.1.
Perhaps treatment of typedef variable has been changed with recent version of PGI.
Some clarification regarding this issue is much appreciated.

mandelbrot_openacc5.c (2.2 KB)

MatColgrove · May 2, 2023, 4:19pm

These are the “z” and “c” variables which are being privatized. Basically an array of these variables, one per thread, is getting allocated on each device. So it’s not that these get converted to pointer types, but rather it’s a pointer to an array of these types.

The error is a device code generation problem. The issue being the privatization of OpenMP shared variables. The work around would be to either declare z and c locally within loop body so they are local variables instead of private, or add them to a private clause on the OpenMP parallel construct.

For example:

      int end   = start+(Y_RESN/num_blocks);
      #pragma acc parallel loop private(k,temp,lengthsq) async(block%2)
          for(i=start; i < end; i++) {
                for(j=0; j < X_RESN; j++) {
                        Compl   z, c;
                        z.real = 0.0;
                        z.imag = 0.0;

or

#pragma omp parallel num_threads(num_gpus) private(z,c)
{
    int my_gpu=omp_get_thread_num();
    acc_set_device_num(my_gpu,acc_device_nvidia);

Note that I typically recommend folks use MPI rather than OpenMP for multi-GPU support. With OpenMP, you need to do the domain decomposition which isn’t natural as opposed to MPI where it is. Not too difficult with this example, but becomes difficult for larger codes.

Second with OpenMP your limited to a single system, while MPI allows you to go multi-node and you can use CUDA Aware MPI to improve communication between. In other words, if you’re going through the trouble of doing the domain decomposition, you should get the benefit of being able to do run mulit-node.

Finally, with OpenMP it can be very tricky to merge a shared host array from the multiple copies of arrays, one on each GPU, as well as wasting memory on the GPU for portions of the array that isn’t used. Managing halos, are particularly difficult to get correct.

Hope this helps,
Mat

vitduck · May 3, 2023, 5:04am

Mat,

Thanks very much for detailed answers.
The code can now be compiled without issue.
I will heed your advice regarding the usage of OpenMP.

Regards.

system · May 17, 2023, 5:05am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
OpenMPI 4.0.1 compilation fails with PGI 19.4 compiler Legacy PGI Compilers	5	4501	June 19, 2019
Problem declaring device types inside OpenMP parallel loop. Legacy PGI Compilers	4	8921	April 12, 2010
License issue when using pgi/20.4 compiler Legacy PGI Compilers	6	302	April 16, 2024
Preprocessed C++ OpenMP code broken with v13.10 compiler Legacy PGI Compilers	4	5569	February 14, 2014
PGI 13.2 and openmpi 1.6.4 Legacy PGI Compilers	14	8596	April 12, 2013
Nested OpenMP not supported in community edition? Legacy PGI Compilers	16	8602	January 18, 2019
OpenACC: struct/class partially shared and partially private Legacy PGI Compilers	2	2471	September 19, 2017
Nvc++ OpenMP error inside llc nvc, nvc++ and nvfortran	5	1128	June 1, 2021
Errors when building with PGI compiler Legacy PGI Compilers	10	15141	January 16, 2012
Building (or not) OpenMPI on a Mac Legacy PGI Compilers	16	21955	March 12, 2008

Invalid cast opcode for typedef variable

Related topics