While this is probably an instance of my Make environment being too generic, I thought I’d ask here: is it possible to use -Mcuda with -tp=[two processors]?
For example, I was compiling our model with -Mcuda=fastmath,ptxinfo,4.1,cc20 as well as -tp=nehalem-64,sandybridge-64 and all sorts of errors appeared. So I found an old CUF kernel test module and tried compiling it:
(1042) > pgfortran -Mcuda=cc20,4.1 -Minfo=all -tp=nehalem-64,sandybridge-64 test.F90 madd_dev: 13, PGI Unified Binary version for -tp=sandybridge-64 20, CUDA kernel generated 20, !$cuf kernel do <<< (*,*), (32,1) >>> 22, Sum reduction generated for sum madd_dev: 13, PGI Unified Binary version for -tp=nehalem-64 20, CUDA kernel generated 20, !$cuf kernel do <<< (*,*), (32,1) >>> 22, Sum reduction generated for sum /gpfsm/dnb31/tdirs/login/dscvr17.535.mathomp4/pgcudaforVZldn2J1Bx4d.gpu(178): error: function "madd_dev_20_gpu" has already been defined /gpfsm/dnb31/tdirs/login/dscvr17.535.mathomp4/pgcudaforVZldn2J1Bx4d.gpu(267): error: function "madd_dev_22_gpu_red" has already been defined 2 errors detected in the compilation of "/gpfsm/dnb31/tdirs/login/dscvr17.535.mathomp4/pgnvdH0ldJtW2HxiD.nv0". PGF90-F-0000-Internal compiler error. Device compiler exited with error status code 0 (test.F90: 25) PGF90/x86-64 Linux 12.6-0: compilation aborted
It looks like the GPU compiler tried to compile the GPU code twice and found it had done it once already!
Now, the obvious thing for me to do is to make sure I’m not doing the double tp when I’m doing GPU (and this will be tested soon), since my GPUs are only next to Westmeres at the moment.
But, in the future, it’s possible there could be a time where I have GPUs on Westmeres and on Sandy Bridges, so this question would be useful to have an answer to: is there a way to have a tp-unified binary with GPUs?