Programming with two languages (Please advise)

torkyahmad · March 9, 2014, 3:57pm

Hello

I have a very specific question. My fortran code produces a “Ax=b” matrix and vectors and I would like to solve it using a family of functions that is written in C which is also Cuda enabled.

How would I start to link my values from fortran to the solver on C using PGI?
Could you please guide me on where I might find the necessary steps?

Thank you!
Ahmed

MatColgrove · March 10, 2014, 3:56pm

Hi Torkin,

I would suggest start by reading Chapter 13, Inter-language Calling, of the PGI Compiler User’s Guide. It give a good overview of the issues you may encounter.

Also, I’d recommend reading about the F2003 iso_c_binding intrinsic module. This module greatly simplifies interoperability with C.

Finally, calling CUDA C from CUDA Fortran follows the same Fortan to C conventions as above. However, I highly recommend using iso_c_binding if you’re going to call CUDA C global kernels directly since you’ll need an interface block describing the call. While this article I wrote is a bit old, it does have an example of CUDA Fortran calling CUDA C. See: Account Login | PGI

Mat

torkyahmad · March 10, 2014, 8:52pm

Wonderful!
Thank you Mat

torkyahmad · March 14, 2014, 9:09am

Hey Mat

As is obvious, I am new to coding. I am looking at the Monte Carlo Integration example and it is very helpful in understanding many things, I am grateful for all the support.

I am just lost in one thing…could you please tell me the exact steps of getting the makefile to start? Do I have to use PGI Bash (64) or will PGI Cmd (64) run it too? I call the directory where “pgi_mc_example” folder exists…and then?

Please help me.
Ahmed

MatColgrove · March 14, 2014, 3:44pm

Hi Ahmed,

On Windows, you’ll want to use the Cygwin (bash) environment since DOS doesn’t support Make.

I wrote that article with Linux in mind and don’t think I tested it on Windows. Though, that’s why we ship Cygwin, so folks a have Unix like environment and access to these tools. I don’t think you’ll have a problem with the code, but may need to adjust paths and make sure nvcc can be invoked (if you’re doing the mixed CUDA C portion). Also, you’ll most likely need to figure out what symbol name nvcc gives the random number generator routine.

I’m a bit swapped today but I can, I’ll try the example myself on Windows and see what I encounter.

Mat

torkyahmad · March 14, 2014, 4:20pm

I do have Cygwin. Please understand that this is my first time to use it. So could you please tell me what to type to get the Makefile going?

I have failed to find anyone online who explains it clearly.

Thank you Mat

MatColgrove · March 14, 2014, 4:45pm

So could you please tell me what to type to get the Makefile going?

Type “make”.

Though, you might want to review the makefile docs for an overview.

Also, the make commands I used are all in the article.

torkyahmad · March 14, 2014, 5:01pm

Thank you for the very helpful replies Mat.
Is this the nvcc incompatibility you had predicted?
I do have MVS 2010 installed and CUDA 5.5.
Any ideas on what I could do next?
I hope that I am not boring you to death. :)

Ahmed

MatColgrove · March 14, 2014, 5:17pm

Hi Ahmed,

Is this the nvcc incompatibility you had predicted?

Nope, this is a different one. nvcc use’s Microsoft’s cl compiler to compile the host portion of the code, so expect it to be in your PATH.

You can set the PATH to cl in the Cygwin environment via “export PATH=<path_to_cl>;$PATH”.

or edit the pgi.bat file and add it there (using DOS set commands, follow the example already in pgi.bat).

While you’ll need to reopen the Cygwin window, I’d recommend adding it to pgi.bat so will always be set.

Mat

torkyahmad · March 14, 2014, 5:32pm

Hey Mat
Getting there… Can you notice why this external is not being read?
Ahmed

PGI$ cd Desktop/pgi_mc_example
PGI$ make clean
rm -rf *.out ./obj/*.o *.mod
PGI$ make
pgfortran -fast -c -Iinc ./src/mcUtils.F90 -o ./obj/mcUtils.o
pgfortran -fast -c -Iinc -Mpreprocess  -DITER=10  -Mconcur=innermost -Minfo=par
./src/mcCPU.F90 -o ./obj/mcCPU.o
montecarlo_cpu:
     28, Parallel code generated with block distribution for inner loop if trip
count is greater than or equal to 100
     44, Loop not parallelized: may not be beneficial
pgfortran -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=0 ./src/monte_drv.F90
-o ./obj/monte_drv_cpu.o
pgfortran -fast   -Mconcur=innermost -Minfo=par ./obj/monte_drv_cpu.o ./obj/mcUt
ils.o ./obj/mcCPU.o -o mcCPU.out
pgfortran -ta=nvidia -fast -c -Iinc -Mpreprocess  -DITER=10 -Minfo=accel ./src/m
cACC.F90 -o ./obj/mcACC.o
montecarlo_acc:
     25, Generating local(temp(:))
         Generating copyin(y(:))
         Generating copyin(x(:))
     31, Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     32, Loop is parallelizable
         Accelerator kernel generated
         32, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.0 : 4 registers; 64 shared, 16 constant, 0 local memory bytes;
 100% occupancy
             CC 2.0 : 12 registers; 4 shared, 84 constant, 0 local memory bytes;
 100% occupancy
     41, Loop is parallelizable
         Accelerator kernel generated
         41, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.0 : 7 registers; 1088 shared, 32 constant, 0 local memory byte
s; 100% occupancy
             CC 2.0 : 10 registers; 1032 shared, 80 constant, 0 local memory byt
es; 100% occupancy
         42, Sum reduction generated for suma
         43, Sum reduction generated for sumsq
pgfortran -ta=nvidia -fast -c -Iinc -Mpreprocess  -DITER=10 -Minfo=accel -DMCTYP
E=1 ./src/monte_drv.F90 -o ./obj/monte_drv_acc.o
pgfortran -fast  -ta=nvidia ./obj/monte_drv_acc.o ./obj/mcUtils.o ./obj/mcACC.o
 -o mcACC.out
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 ./src/mcCUF_1.F90 -o ./o
bj/mcCUF_1.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=11 ./src/monte_
drv.F90 -o ./obj/monte_drv_cuf1.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf1.o ./obj/mcUtils.o ./obj/mcCUF_1.o
-o mcCUF_1.out
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 ./src/mcCUF_2.F90 -o ./o
bj/mcCUF_2.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=12 ./src/monte_
drv.F90 -o ./obj/monte_drv_cuf2.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf2.o ./obj/mcUtils.o ./obj/mcCUF_2.o
-o mcCUF_2.out
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 ./src/mcCUF_3.F90 -o ./o
bj/mcCUF_3.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=13 ./src/monte_
drv.F90 -o ./obj/monte_drv_cuf3.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf3.o ./obj/mcUtils.o ./obj/mcCUF_3.o
-o mcCUF_3.out
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 ./src/mcCUF_4.F90 -o ./o
bj/mcCUF_4.o
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=14 ./src/monte_
drv.F90 -o ./obj/monte_drv_cuf4.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf4.o ./obj/mcUtils.o ./obj/mcCUF_4.o
-o mcCUF_4.out
pgfortran -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 ./src/mcCUF_5.F90 -o ./o
bj/mcCUF_5.o
pgfortran -DUSE_GPU_RNG -Mcuda -fast -c -Iinc -Mpreprocess  -DITER=10 -DMCTYPE=1
5 ./src/monte_drv.F90 -o ./obj/monte_drv_cuf5.o
nvcc -O3 -c -Iinc ./src/MersenneTwister_kernel.cu -o ./obj/MersenneTwister_kerne
l.o
pgfortran -fast  -Mcuda ./obj/monte_drv_cuf5.o ./obj/mcUtils.o ./obj/mcCUF_5.o .
/obj/MersenneTwister_kernel.o -o mcCUF_5.out
mcCUF_5.o : error LNK2019: unresolved external symbol randomgpu__entry reference
d in function mccuf_5_montecarlo_cuf5_
mcCUF_5.out : fatal error LNK1120: 1 unresolved externals
make: *** [mcCUF_5.out] Error 2
PGI$

torkyahmad · March 14, 2014, 7:03pm

Dear Mat

If these questions are becoming too immature, please let me know.

I have only started coding for 3 months, pretty new yes.

I will stop asking such primitive questions once if you inform me that those answers could be found elsewhere in a course or documentation.

Thank you for your unwavering support

Ahmad

torkyahmad · March 14, 2014, 10:32pm

My query was posted earlier, thank you