Compilation time

Hi to all,
I work on a Mac Pro with 2 quad cores and on a GeForce GTX285.
I have three global subroutine that calls some device functions.
When i compile the code, i have to wait more than 10 minutes only for the compilation using float precision for the real numbers and more than 25 minutes using double precision.
The compilation command is :

pgfortran -r4 -i4 -Minfo -tp nehalem-32 -c name.cuf

Can be it possible? Can i reduce the compilation time in order to increase the developing time?

Thanks to all,
Enrico

Hi Enrico,

Try using the CUDA 3.1 tool chain (i.e. -Mcuda=cuda3.1). Another user had a similar issue (See Compilation speed of different compiler versions.) where the problem was the older NVIDIA ptx assembler was quite slow. The newer version seemed to help.

  • Mat

Hi mkcolg,
thank you for your response.
When i use the “-Mcuda=cuda3.1” option, the shell send me these results.

pgfortran -r4 -i4 -Minfo -Mcuda=cuda3.1-tp nehalem-32 -ta=nvidia -c src/MoM_mod_GPU_true.cuf	
pgfortran-Error-Switch -Mcuda with unknown keyword cuda3.1
-Mcuda[=emu|cc10|cc11|cc12|cc13|cc20|cuda2.3|2.3|cuda3.0|3.0|fastmath|keepgpu|keepbin|keepptx|maxregcount:<n>|nofma]
                    Enable CUDA Fortran
    emu             Enable emulation mode
    cc10            Compile for compute capability 1.0
    cc11            Compile for compute capability 1.1
    cc12            Compile for compute capability 1.2
    cc13            Compile for compute capability 1.3
    cc20            Compile for compute capability 2.0
    cuda2.3         Use CUDA 2.3 Toolkit compatibility
    2.3             Use CUDA 2.3 Toolkit compatibility
    cuda3.0         Use CUDA 3.0 Toolkit compatibility
    3.0             Use CUDA 3.0 Toolkit compatibility
    fastmath        Use fast math library
    keepgpu         Keep kernel source files
    keepbin         Keep CUDA binary files
    keepptx         Keep PTX portable assembly files
    maxregcount:<n> Set maximum number of registers to use on the GPU
    nofma           Don't generate fused mul-add instructions

Now I’m updating the PGI Acceleraor and CUDA toolkit.
I’ll let you know what’ll happen.

Thanks,
Enrico