atomic min/max for real data

Tuan · February 4, 2010, 4:21pm

Currently, atomic min only works for INTEGER data. So, for real data, what is your suggestion, if I want to find the min value from an array with each element is processed by an instance of the kernel.

Tuan

MatColgrove · February 4, 2010, 5:07pm

Hi Tuan,

Atomic operations need hardware support and Nvidia doesn’t support atomicmin operations floats, so were limited as to what we can do here. Do you really need the operation to be atomic or could the elemental min function work?

Mat

Tuan · February 4, 2010, 5:08pm

Elemental min function working is good. Do you have any suggestion, Mat?

Tuan

MatColgrove · February 4, 2010, 5:18pm

If the elemental min function works, then I’d use it. If you must use the atomic min, then you’d need to change your array from REAL to INTEGER*4.

Mat

Tuan · February 10, 2010, 10:10pm

Hi Mat,

The reason is that i want the min to work on GPU. However, I’m not sure if the elemental min guarantee the true minimum value among the threads? Could you please confirm this.

Example: suppose minval was assigned the MAXIMUM value before calling to the subroutine

attributes(global) subroutine foo(A, N, minval)
 real, dimension(N,N) :: A
 real :: minval

 idx = threadIdx%x;
 if (idx .le. N) then
   min1 = min(A(idx,:))
   minval = min(min1, minval)
 endif

end subroutine

Tuan

MatColgrove · February 12, 2010, 12:20am

Hi Tuan,

For arrays, you want to use the reduction intrinsic “minval”. Something like:

% cat minreduc.cuf

module minutil

  real, device :: dMinval
  real, device :: dMaxval

contains

  attributes(global) subroutine foo (Ad,N)
    use cudafor
    implicit none

    integer, value :: N
    real, device, dimension(N,N) :: Ad
    integer i, j, tx, ty

    tx = threadidx%x
    ty = threadidx%y
    i = (blockidx%x-1)*16 + tx
    j = (blockidx%y-1)*16 + ty

    Ad(i,j) =  (N*(i-1))+(j-1)

    call syncthreads()
    if (i .eq. 1 .and. j .eq. 1 ) then
       dMinval = minval(Ad)
       dMaxval = maxval(Ad)
    endif
    call syncthreads()

  end subroutine foo

  subroutine testmin ()
    use cudafor
    implicit none
    integer :: N = 64
    real, dimension(N,N) :: A
    real, device, dimension(N,N) :: Ad
    real :: minval, maxval
    type(dim3) :: dimGrid, dimBlock
    A=-1
    dMinval = -1
    dimGrid=dim3(N/16,N/16,1)
    dimBlock=dim3(16,16,1)
    Ad=0
    call foo<<<dimGrid,dimBlock>>>(Ad,N)
    A=Ad
    minval = dMinval
    maxval = dMaxval
    print *, minval, maxval
    print *, A(1,1), A(N,N)
  end subroutine testmin

end module minutil

program testme

  use minutil

  call testmin

end program testme

% pgfortran -o minreduc.out minreduc.cuf
% minreduc.out
    0.000000        4095.000
    0.000000        4095.000

Tuan · February 17, 2010, 11:47pm

mkcolg:

Hi Tuan,

For arrays, you want to use the reduction intrinsic “minval”. Something like:

% cat minreduc.cuf

module minutil

  real, device :: dMinval
  real, device :: dMaxval

contains

  attributes(global) subroutine foo (Ad,N)
    use cudafor
    implicit none

    integer, value :: N
    real, device, dimension(N,N) :: Ad
    integer i, j, tx, ty

    tx = threadidx%x
    ty = threadidx%y
    i = (blockidx%x-1)*16 + tx
    j = (blockidx%y-1)*16 + ty

    Ad(i,j) =  (N*(i-1))+(j-1)

    call syncthreads()
    if (i .eq. 1 .and. j .eq. 1 ) then
       dMinval = minval(Ad)
       dMaxval = maxval(Ad)
    endif
    call syncthreads()

  end subroutine foo

  subroutine testmin ()
    use cudafor
    implicit none
    integer :: N = 64
    real, dimension(N,N) :: A
    real, device, dimension(N,N) :: Ad
    real :: minval, maxval
    type(dim3) :: dimGrid, dimBlock
    A=-1
    dMinval = -1
    dimGrid=dim3(N/16,N/16,1)
    dimBlock=dim3(16,16,1)
    Ad=0
    call foo<<<dimGrid,dimBlock>>>(Ad,N)
    A=Ad
    minval = dMinval
    maxval = dMaxval
    print *, minval, maxval
    print *, A(1,1), A(N,N)
  end subroutine testmin

end module minutil

program testme

  use minutil

  call testmin

end program testme

% pgfortran -o minreduc.out minreduc.cuf
% minreduc.out
    0.000000        4095.000
    0.000000        4095.000

Hi Mat,
Is minval support double precision also, the document just say it supports real, so I’m not sure if this includes double precision?
Please explain me the difference of using min and minval? Does min cannot be used on GPU?

Thanks,
Tuan

MatColgrove · February 18, 2010, 3:14am

Hi Tuan,

Is minval support double precision also, the document just say it supports real, so I’m not sure if this includes double precision?

Yes, minval supports double precision. When they doc say ‘real’, they mean both kinds.

Please explain me the difference of using min and minval?

min determines which of two or more scalar values is the minimum values. minval finds the minimum value of an array.

Does min cannot be used on GPU?

min can be used on the GPU. Though, your code is trying to find the minimum value of an array, hence the use of minval.

Mat

Tuan · February 18, 2010, 3:34am

mkcolg:

Hi Tuan,

Is minval support double precision also, the document just say it supports real, so I’m not sure if this includes double precision?

Yes, minval supports double precision. When they doc say ‘real’, they mean both kinds.

Please explain me the difference of using min and minval?

min determines which of two or more scalar values is the minimum values. minval finds the minimum value of an array.
Does min cannot be used on GPU?
min can be used on the GPU. Though, your code is trying to find the minimum value of an array, hence the use of minval.

Mat

All clear. Thank Mat.

Tuan

Topic		Replies	Views
Finding minimum among multiple threads CUDA Programming and Performance	13	5241	August 11, 2013
parallel way to find min CUDA Programming and Performance	21	7226	April 15, 2011
find minimum num in array #2 CUDA Programming and Performance	15	6506	October 31, 2011
atomicMin on Char? Is there a way to compare char to in to use atomicMin? CUDA Programming and Performance	5	12277	May 11, 2011
Why when I used this two global function, I got different results? CUDA Programming and Performance cuda	8	343	October 4, 2023
minv = min(a[]) very weird timing CUDA Programming and Performance	5	2584	June 21, 2007
how to find the min value CUDA Programming and Performance	2	3723	October 25, 2008
Search max/min value with different parts of an array CUDA Programming and Performance cuda	5	1298	July 12, 2021
Cumpute Max of Vector or Matrix CUDA Programming and Performance	7	3771	June 6, 2011
optimization for atomic operation CUDA Programming and Performance	8	1825	February 15, 2012

atomic min/max for real data

Related topics