OK, you may have noticed that I keep posting messages regarding openMP . In a previous message, I was reporting a problem with openMP that I managed to resolve, and now my code (which is built with PVF) runs normally. I am using optimization for the build, and I am attaching here the command line options:
-Bstatic -Mbackslash -mp -I"c:\program files\pgi\win64\18.4\include" -I"C:\Program Files\PGI\Microsoft Open Tools 14\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\um" -fast -O3 -tp=haswell-64,penryn-64,p7-64,sandybridge-64,px-64 -Minform=warn
I ran a relatively large problem, and I am confident that my code is well-suited for openMP multi-threaded execution (meaning I should be seeing a speedup, and I do see a speedup compared to the case with 1 thread).
I was surprised to see that my program took 6 minutes to complete when it used 16 threads, while the same exact program, when built with Intel’s compiler (and optimization level O2), takes about 25 seconds!!! I imagine that the PVF-built version should not be about 15 times slower than the Intel-built version.
More importantly, the PVF-built program is SHOCKINGLY SLOW IN READ/WRITE operations. However, I do not think that this is the main reason for the program being so much slower.
I would be grateful if you could provide any thoughts on what might be making the compiler building a program which is so much slower than the corresponding Intel-built version, and what I could do to speed up the read-write operations.
Thank you in advance for your help.