Nested loops and zeroed variables

I am currently working with #pragma directives on CUDA accelerator. It works rather smoothly but in these days I have got a curious behavior. The code has at least 5-6 levels of nested loops but the computations executed starting with the penultimate loop has all the variables zeroed and so this is the output of the algorithm. Without #pragma acc directives the code runs fine. I tried to use -Mvect=levels: but this does not work while -Mconcur=levels: makes the code crash.

Could you provide any hint?

Thanks beforehand.

Hi Jon,

We currently max at 7 loop levels (though are in the process of expanding this), but since you’re only at 5-6 levels, this shouldn’t matter. Something else is going on.

Can you please post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

If not, what is the output from “-Minfo=accel”? How are the loops being scheduled?

-Mat

Hi mkcolg,

Thank you for the prompt answer. I cannot send around the code but I can provide you the output of the compilation. Please, note that is a mex function for Matlab and all the environment I built up is properly working to get such Matlab extensions to properly run. Loops at lines 966 and 1035 are those not working zeroing the variables computed above in the code.

mex -g -DCUDA -DDEBUG_MODE addTotalClutter_rain_mex.c RainClutter.c

PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC/x86-64-Extractor Windows 12.10-0: completed with warnings

“C:/Program Files/PGI/win64/12.10/bin\pgc_ex.EXE” addTotalClutter_rain_mex.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc “C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include” -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def WIN32 -def _WIN64 -def __WIN64 -def WIN64 -def x86_64 -def X86_64 -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def extension= -def amd64 -def SSE -def MMX -def SSE2 -def SSE3 -def SSSE3 -def fastcall= -def __PGI_TOOLS10 -predicate “#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)” -idir “C:\Program Files\MATLAB\R2011b\extern\include” -idir “C:\Program Files\MATLAB\R2011b\simulink\include” -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline “C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il” 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 14 0x80 -exlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext

“C:/Program Files/PGI/win64/12.10/bin\pgc.EXE” addTotalClutter_rain_mex.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc “C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include” -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def WIN32 -def _WIN64 -def __WIN64 -def WIN64 -def x86_64 -def X86_64 -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def extension= -def amd64 -def SSE -def MMX -def SSE2 -def SSE3 -def SSSE3 -def fastcall= -def __PGI_TOOLS10 -predicate “#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)” -idir “C:\Program Files\MATLAB\R2011b\extern\include” -idir “C:\Program Files\MATLAB\R2011b\simulink\include” -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -cmdline “+pgcc addTotalClutter_rain_mex.c -m64 -DCUDA -DDEBUG_MODE -c -acc -Minfo=all -Minline -Mvect=levels:10 -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -DMATLAB_MEX_FILE -DPGI_COMPILER -v -IC:\Program Files\MATLAB\R2011b\extern\include -IC:\Program Files\MATLAB\R2011b\simulink\include -g -DMX_COMPAT_32” -inlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext -x 14 32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline “C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il” 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
mexFunction:
127, Loop not vectorized/parallelized: contains call
141, Loop not vectorized/parallelized: contains call
155, Memory copy idiom, loop replaced by call to __c_mcopy8
PGC/x86-64 Windows 12.10-0: compilation completed with warnings
-x 129 2 -x 164 0x1000 -asm C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm

“C:/Program Files/PGI/win64/12.10/bin\pgsmart.EXE” -agg 0x62000020 -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm

“C:/Program Files/PGI/win64/12.10/bin\as64.EXE” C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s “-IC:\Program Files\MATLAB\R2011b\extern\include/” “-IC:\Program Files\MATLAB\R2011b\simulink\include/” -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj

“C:/Program Files/PGI/win64/12.10/bin\pgcnv.EXE” C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj addTotalClutter_rain_mex.obj
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj
Unlinking directory C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext
PGC/x86-64-Extractor Windows 12.10-0: completed

“C:/Program Files/PGI/win64/12.10/bin\pgc_ex.EXE” RainClutter.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc “C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include” -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def WIN32 -def _WIN64 -def __WIN64 -def WIN64 -def x86_64 -def X86_64 -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def extension= -def amd64 -def SSE -def MMX -def SSE2 -def SSE3 -def SSSE3 -def fastcall= -def __PGI_TOOLS10 -predicate “#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)” -idir “C:\Program Files\MATLAB\R2011b\extern\include” -idir “C:\Program Files\MATLAB\R2011b\simulink\include” -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline “C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il” 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 14 0x80 -exlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext

“C:/Program Files/PGI/win64/12.10/bin\pgc.EXE” RainClutter.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc “C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include” -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def WIN32 -def _WIN64 -def __WIN64 -def WIN64 -def x86_64 -def X86_64 -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def extension= -def amd64 -def SSE -def MMX -def SSE2 -def SSE3 -def SSSE3 -def fastcall= -def __PGI_TOOLS10 -predicate “#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)” -idir “C:\Program Files\MATLAB\R2011b\extern\include” -idir “C:\Program Files\MATLAB\R2011b\simulink\include” -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -cmdline “+pgcc RainClutter.c -m64 -DCUDA -DDEBUG_MODE -c -acc -Minfo=all -Minline -Mvect=levels:10 -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -DMATLAB_MEX_FILE -DPGI_COMPILER -v -IC:\Program Files\MATLAB\R2011b\extern\include -IC:\Program Files\MATLAB\R2011b\simulink\include -g -DMX_COMPAT_32” -inlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext -x 14 32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline “C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il” 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -asm C:\Users\ADexecuting C:/Program Files/PGI/win64/12.10/bin/pgnvd C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc2a0G7CRm6WoML.gpu -computecap=13 -ptx C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc3buG78gYvxywW.ptx -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc4c0G7C_cE03ib.bin -ptxinfo C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc5duG78XA4qzqi.info -4.1
executing C:/Program Files/PGI/win64/12.10/bin/pgnvd C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc2a0G7CRm6WoML.gpu -computecap=20 -ptx C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc6e0G7CbX13it4.ptx -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc7fuG78m5TuQRy.bin -ptxinfo C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc8g0G7C0c3WriS.info -4.1
RainClutter:
208, Loop not vectorized: may not be beneficial
Unrolled inner loop 4 times
Used combined stores for 1 stores
233, Loop not vectorized: may not be beneficial
Generated an alternate version of the loop
Unrolled inner loop 4 times
Used combined stores for 1 stores
243, Generating present_or_create(numGridpoints)
Generating present_or_create(power)
Generating present_or_create(thisCellPower)
Generating present_or_create(num_ones)
Generating present_or_create(temp)
Generating present_or_create(iDM)
Generating present_or_create(iDMax)
Generating present_or_create(iDMin)
Generating present_or_create(iC)
Generating present_or_create(k2)
Generating present_or_create(k1)
Generating present_or_create(kk2)
Generating present_or_create(ATT_RAIN)
Generating present_or_copy(Vol_pos_s[0:])
Generating present_or_copyin(mpos_s1[0:])
Generating present_or_copy(Vol_pos_s_no_m[0:])
Generating present_or_copyin(DCM_s_to_be[0:3][0:])
Generating present_or_copy(rel_pos_norm_be_bar[0:])
Generating present_or_copyin(M_ant[0:][0:])
Generating present_or_copy(dir_s_norm[0:])
Generating present_or_copy(Volume_dir_ant[0:])
Generating present_or_copyin(azimuths1[0:179])
Generating present_or_copyin(rain_tab[0:][0:])
Generating present_or_copy(test1[0:24][0:2])
Generating present_or_copy(r[0:][0:])
Generating present_or_copyin(el_range[0:])
Generating present_or_copy(antel[0:])
Generating present_or_copyin(az_range[0:])
Generating present_or_copyin(sum_data2[0:][0:][0:])
Generating present_or_copy(p1[0:])
Generating present_or_copy(p2[0:])
Generating present_or_copy(p3[0:])
Generating present_or_copy(p4[0:])
Generating present_or_copy(complexGain[0:])
Generating present_or_copyin(m_vel_ant[0:])
Generating present_or_copyin(Range_vector[0:nr])
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
249, Loop is parallelizable
Accelerator kernel generated
249, #pragma acc loop gang /* blockIdx.x /
CC 1.3 : 108 registers; 136 shared, 836 constant, 40 local memory bytes
CC 2.0 : 63 registers; 120 shared, 736 constant, 0 local memory bytes
368, #pragma acc loop vector(128) /
threadIdx.x */
272, Loop is parallelizable
356, Loop is parallelizable
368, Loop is parallelizable
470, Loop is parallelizable
570, Loop is parallelizable
644, Loop is parallelizable
651, Loop is parallelizable
768, Loop is parallelizable
769, Loop is parallelizable
774, Loop is parallelizable
785, Loop is parallelizable
790, Loop is parallelizable
966, Loop is parallelizable
1035, Loop is parallelizable
PGC/x86-64 Windows 12.10-0: compilation successful
MINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm

“C:/Program Files/PGI/win64/12.10/bin\pgsmart.EXE” -agg 0x62000020 -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm

“C:/Program Files/PGI/win64/12.10/bin\as64.EXE” C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s “-IC:\Program Files\MATLAB\R2011b\extern\include/” “-IC:\Program Files\MATLAB\R2011b\simulink\include/” -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj

“C:/Program Files/PGI/win64/12.10/bin\pgcnv.EXE” C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj RainClutter.obj
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj
Unlinking directory C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext
File with unknown suffix passed to linker: /DLL
File with unknown suffix passed to linker: /export:mexFunction
File with unknown suffix passed to linker: /implib:C:\Users\ADMINI~1.SD\AppData\Local\Temp\mex_BPwzBU\templib.x
File with unknown suffix passed to linker: /MACHINE:X64
File with unknown suffix passed to linker: /LIBPATH:C:\Program Files\MATLAB\R2011b\extern\lib\win64\microsoft;C:\Program Files\PGI\win64\12.10\lib
[/code]

Hi Jon,

Please, note that is a mex function for Matlab and all the environment I built up is properly working to get such Matlab extensions to properly run.

Interesting. I have a background project to write an article on using OpenACC in Matlab, but unfortunately have gotten sidetrack with other projects so haven’t had the opportunity to work on it. Glad to see that you are experimenting with it.


I’m not liking the schedule being generated:

249, Loop is parallelizable
Accelerator kernel generated
249, #pragma acc loop gang /* blockIdx.x /
CC 1.3 : 108 registers; 136 shared, 836 constant, 40 local memory bytes
CC 2.0 : 63 registers; 120 shared, 736 constant, 0 local memory bytes
368, #pragma acc loop vector(128) /
threadIdx.x */
272, Loop is parallelizable
356, Loop is parallelizable
368, Loop is parallelizable
470, Loop is parallelizable
570, Loop is parallelizable
644, Loop is parallelizable
651, Loop is parallelizable
768, Loop is parallelizable
769, Loop is parallelizable
774, Loop is parallelizable
785, Loop is parallelizable
790, Loop is parallelizable
966, Loop is parallelizable
1035, Loop is parallelizable

It looks to me that you’re using the “parallel” construct and only have loop directives around the loops at lines 249 and 368. The rest of the loops are paralleizable, but getting executed sequentially within the “gang”.

What I’d like you to try is to change to using the “kernels” construct and remove any loop directives. This will allow the compiler to generate what it thinks is the best schedule. I’m not sure this will fix the problem, but I’m curious what it comes up with.

  • Mat

Dear mkcolg,

Now the code is almost perfectly running. The problem was that nested loops were not rectangular. This yielded a strange behavior with the loops extrema set to zero and loops never executed.

Fixed this, I have now the problem that an array, that I initialize to zero before the accelerated region, is no more initialized inside the region. This sums up a lot of garbage producing an Inf as output instead of the correct result, after a summation r_[j] += k is performed. This is the only remaining problem as the code performs really well otherwise and with an exceptional gain of about two magnitude orders with respect to normal Matlab code.

Mex function+PGI compilers work and perform well. It is time that Mathworks supports PGI compilers.

Jon_

Generating present_or_copy(r[0:][0:])

It appears that r is being copied to the device so I’m not sure why this would occur. Maybe the working set of r is smaller than the actual size so it’s not being mapped correctly? Try setting the full extent of both dimensions.

Could r be a ragged array or not contiguous in memory?

Mex function+PGI compilers work and perform well. It is time that Mathworks supports PGI compilers.

The folks at MathWorks are very open to supporting PGI, and I have contacts there. I unfortunately haven’t had time to push them on it. Please do contact them with you experience since having a user push goes further than a vendor.

  • Mat