Segfault and "Missing branch target block" error while compiling larger functions with O1 vs O2 opt levels

Dear All,

Similar to other issues reported by me or my colleagues, I encountered this issue while porting NEURON+CoreNEURON application with OpenACC. I do not have smaller reproducer but I can explain what behaviour I am seeing. I have attached standalone, preprocessed .cpp file that you can use to reproduce the below errors. This is using NVHPC v22.3.

In the attached file we have auto-generated function _nrn_init__Bounce that has structure like:

void _nrn_init__Bounce(NrnThread* _nt, Memb_list* _ml, int _type){
    double* _p; Datum* _ppvar; ThreadDatum* _thread;
    
    if(_ml->instance == nullptr) {
        _ml->instance = malloc(sizeof(_global_variables_t));
        _copy_globals_to_gpu = true;
    }
    _initlists(_ml);
    _update_global_variables(_nt, _ml, _copy_globals_to_gpu);

    _Pragma("acc parallel loop present(_ni[0:_cntml_actual], _nt_data[0:_nt->_ndata], _p[0:_cntml_padded*_psize], _ppvar[0:_cntml_padded*_ppsize], _vec_v[0:_nt->end], nrn_ion_global_map[0:nrn_ion_global_map_size][0:ion_global_map_member_size], _nt[0:1] _thread_present_) if(_nt->compute_gpu)")
     for (_iml = 0; _iml < _cntml_actual; ++_iml) {
                int _nd_idx = _ni[_iml];
                _p[9*_cntml_padded + _iml] = -1e20;
                _v = _vec_v[_nd_idx];
                v = _v;
                initmodel__Bounce(_iml, _cntml_padded, _p, _ppvar, _thread, _nt, _ml, v);
     }

You can see that _update_global_variables is called outside OpenACC loop and has body like:

static void _update_global_variables(NrnThread *_nt, Memb_list *_ml, bool _copy_globals_to_gpu) {
   if(_nt == nullptr || _ml == nullptr) {
     return;
   }
   _global_variables_t* _global_variables = reinterpret_cast<_global_variables_t*>(_ml->instance);
   _global_variables->_ml_mechtype = _mechtype;
   _global_variables->HighThresh = HighThresh;
   _global_variables->LowThresh = LowThresh;

#ifdef NRN_SEGFAULT
   if (_nt->compute_gpu) {
        auto* _d_global_vars = cnrn_target_copyin(_global_variables);
        auto* _d_ml = reinterpret_cast<Memb_list*>(acc_deviceptr(_ml));
        cnrn_target_memcpy_to_device(&(_d_ml->instance), (void**)&(_d_global_vars));
   }
#endif
 }

In our production application I started adding some logic in this _update_global_variables() function (not accelerator regions, only cpu side code calling some OpenACC APIs) and then I started seeing below errors. It seemed like these errors are related to function sizes and inlining because I have similar but smaller auto-generated files for other parts and they don’t fail.

If we compile attached file with O1 then everything works fine:

$ nvc++ -mp -acc -c nrn_svcmalp.cpp -O1 --diag_suppress=111,550 --diag_suppress=161,177

With O2 we see:

$ nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Missing branch target block (nrn_svcmalp.cpp: 1)
NVC++-F-0704-Compilation aborted due to previous errors. (nrn_svcmalp.cpp)
NVC++/x86-64 Linux 22.3-0: compilation aborted

And we enable code under #ifdef NRN_SEGFAULT then we get:

$ nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177 -DNRN_SEGFAULT
nvc++-Fatal-/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2 TERMINATED by signal 11
Arguments to /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2 nrn_svcmalp.cpp -opt 2 -terse 1 -inform warn -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -vect 56 -y 34 16 -x 37 0x480000 -x 34 0x8 -x 32 25952256 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 42 0x20000 -x 39 0x80 -x 59 4 -x 129 2 -tp skylake-avx512 -x 120 0x1000 -astype 0 -x 121 1 -fn nrn_svcmalp.cpp -il /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++Vd4un6X9aIZC.il -x 117 0x200 -x 123 0x80000000 -x 123 4 -x 119 0x20 -def __pgnu_vsn=110200 -x 70 0x40000000 -x 183 4 -x 121 0x800 -x 6 0x20000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -x 249 130 -x 120 0x200000 -x 70 0x40000000 -x 8 0x40000000 -x 164 0x800000 -x 85 0x2000 -x 85 0x4000 -x 34 0x40000000 -x 53 0x800000 -x 83 0x1 -x 84 0x3c7 -x 85 0x1 -x 206 0x02 -x 68 0x1 -x 39 4 -x 56 0x10 -x 26 0x10 -x 26 1 -x 56 0x4000 -accel tesla -accel host -x 180 0x4000400 -x 121 0xc00 -x 186 0x80 -x 180 0x4000400 -x 121 0xc00 -x 194 0x40000 -x 163 0x1 -x 186 0x80000 -cudaver 11060 -x 176 0x100 -cudacap 70 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/cuda-11.6.1-ngetva -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/cuda-11.6.1-ngetva -x 187 0x40000 -x 187 0x8000000 -x 9 1 -x 72 0x1 -x 136 0x11 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 9 1 -x 72 0x1 -x 136 0x11 -gnuvsn 110200 -x 69 0x200 -x 123 0x400 -x 180 0x4000000 -x 194 0x20000000 -cmdline '+nvc++ /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++Vd4un6X9aIZC.il -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -mp -acc -c -O2 -Mvect=simd --diag_suppress=111,550 --diag_suppress=161,177 -DNRN_SEGFAULT' -asm /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++Fd4uDQkReR5O.ll

Not that interesting but adding -Minfo=accel flag removes segfault:

$ nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177 -DNRN_SEGFAULT -Minfo=accel
coreneuron::_net_buf_receive_Bounce(coreneuron::NrnThread *):
      1, include "svclmp.cpp"
         387, Generating present(_nt[:1],_pnt[:_pnt_length],_ZN10coreneuron11nrn_threadsE[:_ZN10coreneuron11nrn_nthreadE],_nrb[:1])
              Generating NVIDIA GPU code
             387, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             391, #pragma acc loop seq
             513, #pragma acc loop seq
             514, #pragma acc loop seq
             519, #pragma acc loop seq
             520, #pragma acc loop seq
             525, #pragma acc loop seq
             526, #pragma acc loop seq
         391, Complex loop carried dependence of _nrb->_nrb_index,_nrb->_nrb_index->,_nrb->_pnt_index,_nrb->_pnt_index->,_nrb->_weight_index,_nrb->_weight_index->,_nrb->_nrb_t,_nrb->_nrb_t->,_nrb->_nrb_flag,_nrb->_nrb_flag->,_pnt+((_j)*8)->_tid,..inline->_weights,..inline->_ml_list,_pnt+((_j)*8)->_type,..inline->_ml_list->,..inline->_ml_list->->_nodecount_padded,_pnt+((_j)*8)->_i_instance,..inline->_ml_list->->_data,..inline->_ml_list->->_pdata,..inline->_ml_list->->_data-> prevents parallelization
              Loop carried dependence of ..inline->_ml_list->->_data-> prevents parallelization
              Loop carried backward dependence of ..inline->_ml_list->->_data-> prevents vectorization
              Complex loop carried dependence of ..inline->_vdata,..inline->_ml_list->->_pdata->,..inline->_vdata->,..inline->_vdata->-> prevents parallelization
              Loop carried dependence of ..inline->_vdata->-> prevents parallelization
              Loop carried backward dependence of ..inline->_vdata->-> prevents vectorization
              Complex loop carried dependence of ..inline->_ml_list->->_net_send_buffer,..inline->_cnt,..inline->_size,..inline->_sendtype,..inline->_sendtype-> prevents parallelization
              Loop carried dependence of ..inline->_sendtype-> prevents parallelization
              Loop carried backward dependence of ..inline->_sendtype-> prevents vectorization
              Complex loop carried dependence of ..inline->_vdata_index,..inline->_vdata_index-> prevents parallelization
              Loop carried dependence of ..inline->_vdata_index-> prevents parallelization
              Loop carried backward dependence of ..inline->_vdata_index-> prevents vectorization
              Complex loop carried dependence of ..inline->_weight_index,..inline->_weight_index-> prevents parallelization
              Loop carried dependence of ..inline->_weight_index-> prevents parallelization
              Loop carried backward dependence of ..inline->_weight_index-> prevents vectorization
              Complex loop carried dependence of ..inline->_pnt_index,..inline->_pnt_index-> prevents parallelization
              Loop carried dependence of ..inline->_pnt_index-> prevents parallelization
              Loop carried backward dependence of ..inline->_pnt_index-> prevents vectorization
              Complex loop carried dependence of ..inline->_nsb_t,..inline->_nsb_t-> prevents parallelization
              Loop carried dependence of ..inline->_nsb_t-> prevents parallelization
              Loop carried backward dependence of ..inline->_nsb_t-> prevents vectorization
              Complex loop carried dependence of ..inline->_nsb_flag,..inline->_nsb_flag-> prevents parallelization
              Loop carried dependence of ..inline->_nsb_flag-> prevents parallelization
              Loop carried backward dependence of ..inline->_nsb_flag-> prevents vectorization
              Complex loop carried dependence of ..inline->_ml_list->->instance,..inline->_ml_list->->instance-> prevents parallelization
              Loop carried dependence of ..inline->_ml_list->->_pdata-> prevents parallelization
              Loop carried backward dependence of ..inline->_ml_list->->_pdata-> prevents vectorization
         409, Generating update self(_nsb->_cnt)
         421, Generating update device(_nsb->_cnt)
         513, Loop is parallelizable
         514, Loop is parallelizable
         519, Loop is parallelizable
         520, Loop is parallelizable
         525, Loop is parallelizable
         526, Loop is parallelizable
coreneuron::uniform_Bounce(int, int, double *, int *, coreneuron::ThreadDatum *, coreneuron::NrnThread *, coreneuron::Memb_list *, double):
      1, include "svclmp.cpp"
         535, Generating acc routine seq
              Generating NVIDIA GPU code
coreneuron::_nrn_watch_check__Bounce(coreneuron::NrnThread *, coreneuron::Memb_list *):
      1, include "svclmp.cpp"
         640, Generating present(_ml[:1],_nt[:1],_nt_data[:_nt->_ndata],_p[:_cntml_padded*10],_ppvar[:_cntml_padded*11],_vec_v[:_nt->end],_ni[:_cntml_actual])
              Generating NVIDIA GPU code
             640, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
         753, Generating update self(_nsb->_cnt)
         766, Generating update device(_nsb->_cnt)
coreneuron::_nrn_init__Bounce(coreneuron::NrnThread *, coreneuron::Memb_list *, int):
      1, include "svclmp.cpp"
         826, Generating present(_ZN10coreneuron18nrn_ion_global_mapE[:_ZN10coreneuron23nrn_ion_global_map_sizeE][:_ZN10coreneuron26ion_global_map_member_sizeE],_nt[:1],_nt_data[:_nt->_ndata],_p[:_cntml_padded*10],_ppvar[:_cntml_padded*11],_vec_v[:_nt->end],_ni[:_cntml_actual])
              Generating NVIDIA GPU code
             826, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
         826, Generating implicit copyin(_thread,_ml) [if not already present]
         842, Generating update self(_nsb->_cnt)
         855, Generating update device(_nsb->_cnt)
coreneuron::_nrn_state__Bounce(coreneuron::NrnThread *, coreneuron::Memb_list *, int):
      1, include "svclmp.cpp"
         901, Generating present(_ml[:1],_nt[:1],_nt_data[:_nt->_ndata],_p[:_cntml_padded*10],_vec_v[:_nt->end],_ppvar[:_cntml_padded*11],_ni[:_cntml_actual])
      1, include "svclmp.cpp"
         901, Generating NVIDIA GPU code
             901, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
[local to nrn_svcmalp_cpp]::coreneuron::_net_send_buffering([local to nrn_svcmalp_cpp]::coreneuron::NetSendBuffer_t *, int, int, int, int, double, double):
      1, include "svclmp.cpp"
      1, include "svclmp.cpp"
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Missing branch target block (nrn_svcmalp.cpp: 1)
[local to nrn_svcmalp_cpp]::coreneuron::_net_receive_kernel([local to nrn_svcmalp_cpp]::coreneuron::NrnThread *, double, [local to nrn_svcmalp_cpp]::coreneuron::Point_process *, int, double):
      1, include "svclmp.cpp"
      1, include "svclmp.cpp"
NVC++-F-0704-Compilation aborted due to previous errors. (nrn_svcmalp.cpp)
NVC++/x86-64 Linux 22.3-0: compilation aborted

I don’t have standalone example here but in our production build, while updating same attached file with additional code I have also seen:

ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2': corrupted size vs. prev_size: 0x00000000032d78e0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7fffed07e474]
/lib64/libc.so.6(+0x8156b)[0x7fffed08056b]
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2[0x77e66a]
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2[0x460c78]
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2[0x488fdc]
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2[0x410569]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fffed021555]
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2[0x4114a0]
======= Memory map: ========
00400000-009a3000 r-xp 00000000 00:2f 789324260                          /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2
00ba2000-00ba3000 r--p 005a2000 00:2f 789324260                          /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2
00ba3000-00c29000 rw-p 005a3000 00:2f 789324260                          /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2
00c29000-06660000 rw-p 00000000 00:00 0                                  [heap]
7fffe8000000-7fffe8021000 rw-p 00000000 00:00 0
7fffe8021000-7fffec000000 ---p 00000000 00:00 0
7fffecfff000-7fffed1c3000 r-xp 00000000 00:14 1130435626                 /usr/lib64/libc-2.17.so
7fffed1c3000-7fffed3c2000 ---p 001c4000 00:14 1130435626                 /usr/lib64/libc-2.17.so
7fffed3c2000-7fffed3c6000 r--p 001c3000 00:14 1130435626                 /usr/lib64/libc-2.17.so
7fffed3c6000-7fffed3c8000 rw-p 001c7000 00:14 1130435626                 /usr/lib64/libc-2.17.so
7fffed3c8000-7fffed3cd000 rw-p 00000000 00:00 0
7fffed3cd000-7fffed4ce000 r-xp 00000000 00:14 1130394199                 /usr/lib64/libm-2.17.so
7fffed4ce000-7fffed6cd000 ---p 00101000 00:14 1130394199                 /usr/lib64/libm-2.17.so
7fffed6cd000-7fffed6ce000 r--p 00100000 00:14 1130394199                 /usr/lib64/libm-2.17.so
7fffed6ce000-7fffed6cf000 rw-p 00101000 00:14 1130394199                 /usr/lib64/libm-2.17.so
7fffed6cf000-7fffed770000 r--p 00000000 00:2f 478727786                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6.0.29
7fffed770000-7fffed865000 r-xp 000a1000 00:2f 478727786                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6.0.29
7fffed865000-7fffed8d2000 r--p 00196000 00:2f 478727786                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6.0.29
7fffed8d2000-7fffed8dd000 r--p 00202000 00:2f 478727786                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6.0.29
7fffed8dd000-7fffed8e0000 rw-p 0020d000 00:2f 478727786                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6.0.29
7fffed8e0000-7fffed8e3000 rw-p 00000000 00:00 0
7fffed8e3000-7fffed905000 r-xp 00000000 00:14 1130466476                 /usr/lib64/ld-2.17.so
7fffedaa9000-7fffedaae000 rw-p 00000000 00:00 0
7fffedaae000-7fffedab1000 r--p 00000000 00:2f 471411131                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1
7fffedab1000-7fffedac3000 r-xp 00003000 00:2f 471411131                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1
7fffedac3000-7fffedac6000 r--p 00015000 00:2f 471411131                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1
7fffedac6000-7fffedac7000 r--p 00017000 00:2f 471411131                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1
7fffedac7000-7fffedac8000 rw-p 00018000 00:2f 471411131                  /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1
7fffedac8000-7fffedac9000 rw-p 00000000 00:00 0
7fffedaf0000-7fffedaf2000 rw-p 00000000 00:00 0
7fffedaf3000-7fffedb02000 rw-p 00000000 00:00 0
7fffedb02000-7fffedb04000 r-xp 00000000 00:00 0                          [vdso]
7fffedb04000-7fffedb05000 r--p 00021000 00:14 1130466476                 /usr/lib64/ld-2.17.so
7fffedb05000-7fffedb06000 rw-p 00022000 00:14 1130466476                 /usr/lib64/ld-2.17.so
7fffedb06000-7fffedb07000 rw-p 00000000 00:00 0
7ffffffce000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
nvc++-Fatal-/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2 TERMINATED by signal 6
Arguments to /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2
/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/compilers/bin/tools/cpp2 x86_64/corenrn/mod2c/svclmp.cpp -debug -x 120 0x200 -opt 2 -terse 1 -inform warn -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -vect 56 -y 34 16 -x 37 0x480000 -x 34 0x8 -x 32 25952256 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 42 0x20000 -x 39 0x80 -x 59 4 -x 129 2 -tp skylake-avx512 -astype 0 -x 121 1 -fn x86_64/corenrn/mod2c/svclmp.cpp -il /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++h-7pve1ZQy5G.il -x 117 0x200 -x 123 0x80000000 -x 123 4 -x 119 0x20 -def __pgnu_vsn=110200 -x 70 0x40000000 -x 183 4 -x 121 0x800 -x 6 0x20000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -x 249 130 -x 176 0x100 -cudacap 70 -cudacap 80 -cudaver 11060 -x 120 0x200000 -x 70 0x40000000 -x 8 0x40000000 -x 164 0x800000 -x 85 0x2000 -x 85 0x4000 -x 34 0x40000000 -x 53 0x800000 -x 83 0x1 -x 84 0x3c7 -x 85 0x1 -x 206 0x02 -x 68 0x1 -x 39 4 -x 56 0x10 -x 26 0x10 -x 26 1 -x 56 0x4000 -accel tesla -accel tesla -x 180 0x4000400 -x 121 0xc00 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/cuda/11.6 -x 176 0x100 -cudacap 70 -cudacap 80 -x 180 0x4000400 -x 121 0xc00 -x 186 0x80 -x 180 0x4000400 -x 121 0xc00 -x 194 0x40000 -x 163 0x1 -x 186 0x80000 -cudaver 11060 -x 176 0x100 -cudacap 70 -cudacap 80 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/cuda/11.6 -x 189 0x8000 -y 163 0xc0000000 -x 192 0x40000000 -x 189 0x10 -y 189 0x4000000 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/cuda/11.6 -x 187 0x40000 -x 187 0x8000000 -x 176 0x100 -cudacap 70 -cudacap 80 -cudaver 11060 -x 9 1 -x 72 0x1 -x 136 0x11 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 9 1 -x 72 0x1 -x 136 0x11 -x 137 1 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/nvhpc-22.3-ukx56m/Linux_x86_64/22.3/cuda/11.6 -x 192 0x40000000 -x 62 8 -gnuvsn 110200 -x 69 0x200 -x 123 0x400 -x 180 0x4000000 -x 194 0x20000000 -x 119 0x08 -x 137 1 -x 137 0x200000 -cmdline '+nvc++ /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++h-7pve1ZQy5G.il -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -mp -g -O2 -Mvect=simd --c++14 -cuda -gpu=cuda11.6,lineinfo,cc70,cc80 -acc -Mautoinline -DEIGEN_DONT_VECTORIZE=1 -fPIC -DCORENEURON_CUDA_PROFILING -DCORENEURON_ENABLE_GPU -DCORENEURON_BUILD -DHAVE_MALLOC_H -DNRNMPI=1 -DMPI_NO_CPPBIND=1 -DOMPI_SKIP_MPICXX=1 -DMPICH_SKIP_MPICXX=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING -I/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/nrn/external/coreneuron/build_gpu/include/coreneuron/utils/randoms -I/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/nrn/external/coreneuron/build_gpu/include -I/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/hpe-mpi-2.25.hmpt-4ukyxt/include -c -o x86_64/corenrn/build/svclmp.o -DNRN_PRCELLSTATE=0 --diag_suppress=111,550 --diag_suppress=161,177' -asm /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/kumbhar/391227/nvc++3-7pLEkD3g5a.ll
make[3]: *** [x86_64/corenrn/build/svclmp.o] Error 127

Let me know if you need more information.

nrn_svcmalp.cpp (2.1 MB)

1 Like

Thanks for the report!

I was able to recreate the error here and tracked it down to the “atomic capture” in the “_net_send_buffering” routine and some issue when auto-inlining the atomic. I filed a report (TPR #31709) and sent to engineering for investigation.

The work around is to add a structured block to the atomic capture, or disable auto-inlining via the “-Mnoautoinline” flag.

static void _net_send_buffering(NetSendBuffer_t* _nsb, int _sendtype, int _i_vdata, int _weight_index,
 int _ipnt, double _t, double _flag) {
  int _i = 0;
#ifdef WORKS
  #pragma acc atomic capture
  {
     _i = _nsb->_cnt++;
  }
#else
  #pragma acc atomic capture
  _i = _nsb->_cnt++;
#endif
% nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177 --no_preincludes --c++17 --gcc-toolchain=/home/sw/thirdparty/gcc/gcc-11.2.0/Linux_x86_64/ -DNRN_SEGFAULT
NVC++-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Missing branch target block (nrn_svcmalp.cpp: 1)
NVC++-F-0704-Compilation aborted due to previous errors. (nrn_svcmalp.cpp)
NVC++/x86-64 Linux Rel Dev-r212943: compilation aborted

% nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177 --no_preincludes --c++17 --gcc-toolchain=/home/sw/thirdparty/gcc/gcc-11.2.0/Linux_x86_64/ -DNRN_SEGFAULT -DWORKS
% nvc++ -mp -acc -c nrn_svcmalp.cpp -O2 --diag_suppress=111,550 --diag_suppress=161,177 --no_preincludes --c++17 --gcc-toolchain=/home/sw/thirdparty/gcc/gcc-11.2.0/Linux_x86_64/ -DNRN_SEGFAULT -Mnoautoinline
%

-Mat

1 Like

thank you very much Mat for quick response!

We need -Mautoinline for some other historical reasons/issues but I am glad to know about the temporary work around using structured block!

I tested this in our production code and it compiles fine now. (avoid me searching for uglier work around)