I have an application that I’m running on an 8 core (4 dual core cpus) NUMA system running Linux Fedora Core 4. I would like for the Portland compiler to try to spread the application (which is normally 1 process) over more processors to speed it up. I compiled the application with Mconcur=numa (it’s dynamically linking libnuma.so) and ran it, but I can see that it’s only using 1 CPU. All other CPUs show 0% usage. Also the runtime appears to be slightly longer than when I don’t use Mconcur.
Do you know of anything that I could be missing?