How to use the "PRINT" statements

I try to use the “print” statements, and write the test program like this:

	attributes(global) subroutine dev_test()
		print *, 'dev'
	end subroutine dev_test

	program main_test
		use cudafor
		call dev_test<<<1>>>()
		print *, cudaGetErrorString(cudaGetLastError())
	end program main_test

I use “pgfortran -Mcuda -ta=nvidia:cc20,cuda4.0 test.f90” to compile the program. But when I run it on Tesla M2050, I only get the “no error” message. what’s wrong with it?

Hi Steve,

Your program is ending before it can get the print back from the device. Kernel calls are asynchronous to the host code and you have nothing to block the host’s execution. Try adding a “cudaThreadSynchronize” just after the kernel call.

Note that also put your routine into a module since all global device routines must have an implicit or explicit interface. It’s not causing the print problems, but would causing issues later if you tried to pass in any arguments.

Finally, you should avoid printing from the device since it will severely limit your performance.

  • Mat
% cat prt.cuf 
   module foo
   contains
	
   attributes(global) subroutine dev_test()
      print *, 'dev'
   end subroutine dev_test

   end module foo

   program main_test
      use cudafor
      use foo
      call dev_test<<<1>>>()
      ierr = cudaThreadSynchronize()
      print *, cudaGetErrorString(cudaGetLastError())
   end program main_test 
% pgf90 prt.cuf; a.out
 dev
 no error

Thanks, Mat.
It works. But in my machine, the code

call dev_test<<<1>>>()

should be " call dev_test<<<1>>>() " . :-)

  • Mat

% cat prt.cuf
module foo
contains

attributes(global) subroutine dev_test()
print *, ‘dev’
end subroutine dev_test

end module foo

program main_test
use cudafor
use foo
call dev_test<<<1>>>()
ierr = cudaThreadSynchronize()
print *, cudaGetErrorString(cudaGetLastError())
end program main_test
% pgf90 prt.cuf; a.out
dev
no error

>

I have an other question. In device code, how to print the value of a variable, which is a kind of real(8)? In my code, when I use the “print *” clause, it only print 6 decimals.

Hi Steve,

I did some digging but there’s not much we can do right now. We don’t have a formatted print available, just list directed, and we’re at the mercy of the underlying NVIDIA CUDA print, which looks to be giving only 6 decimals.

  • Mat