Allocate memory in structure of arrays (CUDA Fortran)

Hello.

I’m trying to compile and execute next code:

1.cuf :

module fluid

  type omg
    real, allocatable :: sad(:)
    real, allocatable :: saf
  end type

end module

program broken_derivative_types
  use cudafor
  use cudadevice
  use fluid
  implicit none
  integer :: istat
  type (omg) :: h
  type (omg), device :: d
  real, allocatable, device :: d_z(:)
  real, allocatable :: h_z(:)

print *, "dbg"
  istat = cudaMalloc(d_z, 10)
  allocate(h_z(10))
  d_z = 10.0
  h_z = d_z
  print *, h_z
print *, "dbg2"
  allocate(h%sad(10))
print *, "dbg3"
  istat = cudaMalloc(d%sad, 10)
  allocate(d%sad(10))
print *, "istat: ", istat
  d%sad = 10.0
print *, "dbg4"
  h%sad = d%sad
print *, h%sad
end program broken_derivative_types

Here is the output:

 dbg
    10.00000        10.00000        10.00000        10.00000     
    10.00000        10.00000        10.00000        10.00000     
    10.00000        10.00000    
 dbg2
 dbg3
0: copyout Memcpy (host=0xbfc5de10, dev=0x809d3c4, size=76) FAILED: 11(invalid argument)

And if I uncomment this line:

!  istat = cudaMalloc(d%sad, 10)

Then I get next error from compiler:

PGF90-S-0155-Could not resolve generic procedure cudamalloc (1.cuf: 30)
  0 inform,   0 warnings,   1 severes, 0 fatal for broken_derivative_types

pgfortran version is 11.9:

pgfortran 11.9-0 32-bit target on x86 Linux -tp core2 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/x86 Linux 11.9-0
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.

If this possible, I want to see, where did I make error (and see correct code for managing memory with derivative types).
Thanks in advance.

Okay, that’s too much.
When I’m using a structure with variables in it:

dimen = 1, prk = 4;

  type Fluidstemp
    sequence
    real(kind=prk) :: energy, density
    real(kind=prk) :: U(dimen)
    real(kind=prk) :: r(dimen)
  end type

and when I add some variable like that:

  type Fluidstemp
    sequence
    real(kind=prk) :: z, energy, density
    real(kind=prk) :: U(dimen)
    real(kind=prk) :: r(dimen)
  end type

I’ve been encountering problem that simply described in theese two plots:

Structure without additional variable: http://i.imgur.com/qVNAC.png
With: http://i.imgur.com/SD2C3.png

Here is full code: ! coursework_2011.cuf!! FUNCTIONS:! COURSEW - Entry point of conso - Pastebin.com

pgf90 -V:

pgf90 11.9-0 32-bit target on x86 Linux -tp core2 
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.

Compiling with:
pgf90 -rc=rc4.0 -m32 coursework_2011_cuda_tesla_1dv2.cuf csort.o -o b.out

rc4.0:

set CUDAROOT=/opt/cuda;
set CUDAVERSION=4.0;

csort.o was obtained by executing:

nvcc -m32 -c -arch sm_13 csort.cu

csort.cu: #include <thrust/device_vector.h>#include <thrust/device_vector.h>#include < - Pastebin.com

I think thats kinda serious bug in there.
Thanks in advance.

Hi Brute1k,

In your first example, this is a known limitation of CUDA Fortran. The size of device types must be known upon allocation. Hence, user defined types used as device types can only be composed of basic types, or fixed size arrays.

As for the seconds issue, I was able to compile and run your program. However, I’m not sure how to determine if I’m getting wrong answers. The program outputs to ~9000 text files but I’m not sure what to do with them. Can I compare a particular file or do I need to plot them?

I did compare the two resulting CUDA C code (-Mcuda=keepgpu), with and and without “z”, but don’t see anything obvious.

Finally what is the contents of your “rc4.0” file?

Thanks,
Mat

Okay, that’s clear for me now. Anyway, is this possible somehow to “fix” this limitation or maybe just avoid it (except just using basic types or hard coding array limits) in near future?

Oh, thats my bad. I’m really sorry about so much output.
Generally, you need to get gnuplot or any program, that understand 2-column plot data (just “x” and “y” values). After that, you need to plot all values (p_, V_, Rho_ with the same number) in one graph. Well, I’ve developed a little .sh script that will help you to generate output easily, if you have linux environment and gnuplot installed: #!/bin/bashxrangemn=0;xrangemx=`cat coursework_2011_cuda_tesla_1dv2.cuf | - Pastebin.com
Just place it in the same location where your output is generated and run. It should generate 3000 (that’s should be enough to see the evolution) *.png . Here is crucial string to generate correct x axis limits:

xrangemx=`cat coursework_2011_cuda_tesla_1dv2.cuf | grep inparams%dx | awk '{print $3}'`*`cat coursework_2011_cuda_tesla_1dv2.cuf | grep 'integer, parameter :: grid' | awk '{print $6}'`;

I hope you are familiar with bash scripting and if not then you should just rename “coursework_2011_cuda_tesla_1dv2.cuf” here to your program name.

Then you can copy, for example “plot0140.png” to another location, change structure Fluidstemp according to my instructions in post before, run tests again, make graphs, copy new “plot0140.png” and take a look if they have any differences. They should be absolutely the same (because in fact, we didn’t change anything except structure memory alignment).

Well, I can tell you the error happens in global subroutine called ‘cuda_calculate_flux_tvd’.
This is just bugs me. I can’t tell why variable Fl%density (or Fr%density) isn’t saving any values (and just contains some garbage in output) in this condition (change corresponding lines to get that buggy output):

571: Fr%density = 0.0
590: outfluxes(dims)%density(idx) = Fr%density ! for example
! dims = 1 anyway, you can just change it to outfluxes(1)

Aaand… The output in outfluxes(1)%density(:) will be NOT 0.0
Debug code in lines 1046 – 1049.

If you read my previous post you can see it:

set CUDAROOT=/opt/cuda;
set CUDAVERSION=4.0;



Thank you too! I’m really glad you’ve answered me.

Hi Brute11k,

FYI, now that I’m back in the office I’ll have more time to figure this one out. For some reason, the “non-z” version of the program keeps hanging for me around iteration 5512254. Not sure why.

I’ll keep you posted as I investigate. I think my next step is to remove thrust and put in my own sort routines. This will allow me to run the program in emulation mode and the debugger.

  • Mat