Core binding on more cores than found at compile time

Hi,

I’m running into a problem that feels like it must be user error, but I can’t seem to determine how to fix it. If I compile a code containing OpenMP pragmas on a system with 8 cores, it runs fine locally and on all systems I try with 8 or fewer cores whether I use MP_BIND or not (compiling with -mp=allcores or -mp=bind all fine). When I run a binary compiled on that system on a system with 12 cores and set it to bind, I get this followed by the program crashing:

mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address

On a system with 24 cores, again crashing:

mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address
mbind: Bad address

Always the number of cores minus 8. It doesn’t seem to matter if I specify MP_BLIST or any of that, if binding is turned on I get that message once for every core over the number of cores on the compiling machine. Is this expected behavior? If so is there some way I can set a number of cores higher than the number in the compiling machine so I can effectively use the others?

Hi njustn,

This is a new one so I’m not sure what’s wrong. There isn’t anything in how we bind tied to the compilation system, it’s all determined at runtime, so this behaviour is odd.

The error itself is coming out of the NUMA library, so I’m wondering if the problem lies there. First, can you run “ldd my.exe” and see which libnuma is being picked up? Also, did you link statically (-Bstatic)? Finally, what happens if you don’t use libnuma (-mp=nonuma)?

  • Mat

Hi Mat,

I did not link statically, so it’s using the libnuma on the other machine. It may be a different version, but then I would think it would fail on all the cores rather than cores minus 8…

ldd output:

        linux-vdso.so.1 =>  (0x00007fff967ff000)
        libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007fef98774000)
        libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4 (0x00007fef9851b000)
        libm.so.6 => /lib/libm.so.6 (0x00007fef98299000)
        libdl.so.2 => /lib/libdl.so.2 (0x00007fef98095000)
        libcolamd.so.2.7.1 => /usr/lib/libcolamd.so.2.7.1 (0x00007fef97e8d000)
        libnuma.so => /usr/lib/libnuma.so (0x00007fef97c85000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007fef97a69000)
        libc.so.6 => /lib/libc.so.6 (0x00007fef97706000)
        libz.so.1 => /usr/lib/libz.so.1 (0x00007fef974ef000)
        librt.so.1 => /lib/librt.so.1 (0x00007fef972e7000)
        libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fef96fd2000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007fef96dbc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fef99368000)

Compiling with -mp=nonuma does allow the program to run whether I set bind or not, does it bind the threads in that case though? I had been using a workaround and just manually binding them with CPU_SET, so I know you can without libnuma, but does it?

libnuma.so => /usr/lib/libnuma.so (0x00007fef97c85000)

Ok, it’s using the system’s libnuma and not our dummy version.

Compiling with -mp=nonuma does allow the program to run whether I set bind or not, does it bind the threads in that case though?

No, it’s not binding in this case. Though it does tell us that it’s a problem with libnuma (or how the PGI runtime is interacting with it)

I had been using a workaround and just manually binding them with CPU_SET, so I know you can without libnuma, but does it?

I personally use the ‘numactl’ or ‘taskset’ utility, but have not used CPU_SET.

Are you able to determine the libnuma.so version? Which OS are each system?

Can you compile (with binding, i.e. just -mp) and run on the 12 core system and see if the issue still occurs?

Thanks,
Mat

Hi,

i had mysterious mbind: Bad Address errors when I did not call numa_available() before anything else, as told by the man page.

This occured only on Intel machines, also on the one I used for compiling.
Running the binary on an AMD machine worked fine without calling numa_available.

  • Robert