help me with my first CUDA Fortran program.

l4linux · March 28, 2010, 10:23am

I download pgi workstation complete 10.3 Linux x86_64 , read the pdf files and wrote my first cuda fortran program , but I failed to compile it.

thanks a lot.

module cacmu
use cudafor
contains
attributes(global) subroutine cac(n,x)
implicit none
integer :: n
real :: x
integer :: i
x=0.0
do i=1,N
x=x+real(i)
enddo
end subroutine cac
end module cacmu

program main
use cudafor
use cacmu
implicit none
integer :: n_=1000000*64
real :: x_
call cac<<<n_/64,64>>>(n_,x_)
print *,x_
end program main

then I compiled it as following
$ pgf95 1.cuf
PGF90-S-0188-Argument number 1 to cac: type mismatch (1.cuf: 22)
PGF90-S-0188-Argument number 2 to cac: type mismatch (1.cuf: 22)
0 inform, 0 warnings, 2 severes, 0 fatal for main

l4linux · March 28, 2010, 10:27am

My GPU card is a Gigabyte GT240 1GB DDR5 which should support cuda sm1.2 .

[root]# pgaccelinfo
CUDA Driver Version 3000

Device Number: 0
Device Name: GeForce GT 240
Device Revision Number: 1.2
Global Memory Size: 1073020928
Number of Multiprocessors: 12
Number of Cores: 96
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment 256B
Clock Rate: 1462 MHz
Initialization time: 14333 microseconds
Current free memory 1020030976
Upload time (4MB) 6511 microseconds (1371 ms pinned)
Download time 3893 microseconds ( 961 ms pinned)
Upload bandwidth 644 MB/sec (3059 MB/sec pinned)
Download bandwidth 1077 MB/sec (4364 MB/sec pinned)

BeachHut1 · March 29, 2010, 1:15pm

You have several problems with your code and compilation. The two that will prevent you from compiling are that in your kernel you should have integer, value :: n, real, value :: x. You must also compile with -Mcuda. Another problem with your code is that I don’t think you understand how GPU programming works. You have no references to threads. Perhaps you are thinking of Accelerator instead (#pragma acc).

MatColgrove · March 30, 2010, 12:23am

Hi l4linux,

In CUDA, every thread will execute the same kernel. In your example, you have all threads sequentially summing a value. As BeachHut suggests, you could get this code running, but I doubt it would be very fast nor what you intended.

While you can perform a sum reduction in parallel (I touch upon it in my last PGI Insider Article), it’s rather difficult. Instead, you should consider starting with a simple Matmul program (See: Account Login | PGI)

Hopefully, this will get you started. If not, please let us know and we’ll try to help further.

Mat

l4linux · March 31, 2010, 3:44pm

Oh, Yes , I’m so stupid.
Thank you all, for your messages.

Topic		Replies	Views
Problem with CUDA fortran simple program Legacy PGI Compilers	4	9166	February 11, 2010
CudaFotran compiling problem When i am comipiling the cuda fortran code, type mismatch error is com CUDA Programming and Performance	13	3763	December 1, 2009
Errors in this Code Legacy PGI Compilers	2	2285	June 3, 2010
An Easy Introduction to CUDA Fortran Technical Blog	7	649	June 21, 2024
The output is wrong! it seems gpu doesnt do the work Legacy PGI Compilers	3	1516	October 31, 2018
matrix reduction using cuda fortran and GPU Legacy PGI Compilers	33	13826	December 21, 2012
cuda fortran sample code Legacy PGI Compilers	5	12912	June 16, 2010
how to carry out the sum operation in cuda fortran? Legacy PGI Compilers	8	12694	January 22, 2024
Problem with CUDA Fortran Example: Matrix Multiply. Legacy PGI Compilers	4	3325	August 18, 2010
Cuda Fortran Legacy PGI Compilers	5	5213	November 18, 2011

help me with my first CUDA Fortran program.

Related topics