Hi to all,
I work on a Mac Pro with 2 quad cores and on a GeForce GTX285.
I have three global subroutine that calls some device functions.
When i compile the code, i have to wait more than 10 minutes only for the compilation using float precision for the real numbers and more than 25 minutes using double precision.
The compilation command is :
Try using the CUDA 3.1 tool chain (i.e. -Mcuda=cuda3.1). Another user had a similar issue (See Compilation speed of different compiler versions.) where the problem was the older NVIDIA ptx assembler was quite slow. The newer version seemed to help.
Hi mkcolg,
thank you for your response.
When i use the “-Mcuda=cuda3.1” option, the shell send me these results.
pgfortran -r4 -i4 -Minfo -Mcuda=cuda3.1-tp nehalem-32 -ta=nvidia -c src/MoM_mod_GPU_true.cuf
pgfortran-Error-Switch -Mcuda with unknown keyword cuda3.1
-Mcuda[=emu|cc10|cc11|cc12|cc13|cc20|cuda2.3|2.3|cuda3.0|3.0|fastmath|keepgpu|keepbin|keepptx|maxregcount:<n>|nofma]
Enable CUDA Fortran
emu Enable emulation mode
cc10 Compile for compute capability 1.0
cc11 Compile for compute capability 1.1
cc12 Compile for compute capability 1.2
cc13 Compile for compute capability 1.3
cc20 Compile for compute capability 2.0
cuda2.3 Use CUDA 2.3 Toolkit compatibility
2.3 Use CUDA 2.3 Toolkit compatibility
cuda3.0 Use CUDA 3.0 Toolkit compatibility
3.0 Use CUDA 3.0 Toolkit compatibility
fastmath Use fast math library
keepgpu Keep kernel source files
keepbin Keep CUDA binary files
keepptx Keep PTX portable assembly files
maxregcount:<n> Set maximum number of registers to use on the GPU
nofma Don't generate fused mul-add instructions
Now I’m updating the PGI Acceleraor and CUDA toolkit.
I’ll let you know what’ll happen.