We have some code being accelerated on an NVIDIA S1070 GPU (compute capability 1.3) using PGI 10.9 directives (we are not using PGI 11.1 because we get internal compiler errors for some reason, but it compiles fine on 10.9).
The NVIDIA driver we are using is for CUDA 3.1 since the manual of PGI 10.9 still does not certify the compiler to run on a CUDA 3.2 driver.
The code runs fine and produces correct results on the S1070 GPU.
Next we wanted to run the code on an S2050 GPU (Fermi with compute capability 2.0). We noticed that when you compile the code, the compiler by default creates a binary that is compute capability 1.3 even though we are running on a Fermi. To overwrite the default, we use the cc20 flag “-ta=nvidia,time,cuda3.0,cc20” and it produces a 2.0 compute capability binary.
If we don’t use the cc20 flag on Fermi, the code produces totally bizzare results.
When we use the cc20 flag on Fermi, the code runs but produces results that are a bit off than what we are expecting (the run on the S1070 produces correct expected results though).
Is there any special settings needed from the PGI side to get things working on Fermi correctly other than compiling with the cc20 option?
I found this PGI posting by Michael Wolfe and that’s where I got the cc20 option from:
We have NVIDIA involved in this as well but no answers so far.
Thank you for your help.