I’m debugging a CUDA fortran code.In emu mode, it works great.
But it’s failed with cuda execution, with error messages:
copyout Memcpy (host=0x7c5c60, dev=0x1e720000, size=1310720) FAILED: 4(unspecified launch failure)
Notice: I use “-Mcuda=emu -Mbounds” parameters, so do not worry about out-bound array problem.
As I/O is not allowed in cuda kernel, the handy debug is really challenge. So I decided to comment some lines to see the problem.
Question 1: Are there any debug tools for pgfortran cuda?
But another important thing for debug, is to stop the optimization of compiler. I try to use “pgfortran -O0”, it do not stop the optimization of cuda code.
Question 2: How to stop the optimization of cuda fortran code?
I have a test on the compiler parameters of :
pgfortran -Mmpi=mpich1 -Mcuda=keepgpu -Mcuda=keepbinn -Mcuda=keepptx -Mcuda=ptxinfo
I can see the following files generated for kernel module compiling kept.
(all global and device subroutines are organized in raycast_GPUkernel module)
- raycast_GPUkernel.001.gpu – ANSI C code file, converted from raycast_GPUkernel.F90
- raycast_GPUkernel.001.h – header file of file 1, two include file are also required by file 1:“cuda_runtime.h” and “pgi_cuda_runtime.h”
- raycast_GPUkernel.002.bin
- raycast_GPUkernel.002.ptx – ptx code for cuda 1.4, ASM file
- raycast_GPUkernel.003.bin
- raycast_GPUkernel.003.ptx – ptx code for cuda 2.1 ?
So for a cuda fortran code, it first convert to cuda c by pgfortran? Then compiled into ptx file using embered nvcc-like tools ?
Notice in file 4 & 6, some text are list below:
.version 2.1 !--- for .002.ptx, it's .version 1.4
.target sm_20 !--- sm_13
// compiled with /opt/applications/pgi/linux86-64/2011/cuda/3.1/open64/lib/be
//-----------------------------------------------------------
// Compiling /tmp/pbs.15539.service0/pgnvdvLwd-JE4MrdP.nv4 (/tmp/pbs.15539.ss
ervice0/pgnvdLLwdVIRdSi62.nv6)
//-----------------------------------------------------------
//-----------------------------------------------------------
// Options:
//-----------------------------------------------------------
// Target:ptx, ISA:sm_20, Endian:little, Pointer Size:64
// -O3 (Optimization level)
// -g0 (Debug level)
// -m2 (Report advisories)
//-----------------------------------------------------------
I think this “-O3” is automatically given by pgfortran, do not depending on the parameter “-O” given by user to pgfortran.
Question 3: Is there a way to allow user to compiler the raycast_GPUkernel.001.gpu by nvcc?
Thanks in advance.