Convert HOST Loop to CUDA

rhaney · September 5, 2019, 6:59pm

Good Afternoon,

This may be a simple/stupid question, but can anyone tell me how to convert a loop into a CUDA kernel using the following simple/toy C-code?

#include <stdio.h>

int main(int argc, char **argv){
  float result1 = 0.0f;
  float result2 = 0.0f;
  float *array;
  int numElems = 128;

  array = (float*)malloc(numElems*sizeof(float));
  // initialize with sample data
  for(int i = 0; i < numElems; ++i){
    array[i] = i * 1.5f;
  }

  // how to convert this to CUDA ?
  for(int i = 0; i < numElems; ++i){
    result1 = result1 * array[i];
    result2 = result2 * (array[i]*0.5f);
  }

  free(array);

  return 0;
}

In the end I don’t want to modify the input array to compute the final result1 and result2 variables.

I have a feeling that the loop defined in the above C-code may not be as straightforward as it seems.

Thanks in advance for any help.

Robert_Crovella · September 5, 2019, 8:28pm

You don’t need CUDA or a loop to process this. The answer is zero for both result1 and result2.

In the more general case where you have initialized result1 and result2 to something other than zero (and, for completeness, lets assume your array is not a simple arithmetic sequence), this can be accomplished with a parallel reduction.

Considering the calculation for result1, the only difference from the typical treatise on parallel reduction:

[url]https://developer.download.nvidia.com/assets/cuda/files/reduction.pdf[/url]

is that you are taking the product in every case, instead of the sum. If you sort that out, I think the method to calculate result2 will be immediately obvious.

rhaney · September 6, 2019, 11:58am

Thanks,

You are correct, reduction is the likely answer. Also, thank you for the link.

rhaney · September 6, 2019, 1:10pm

The information for reduction was very good - thank you.

I know that CUB has some operations for something similar - do you have any recommendation(s) regarding using CUB to reduce? Any sample/example code using CUB for the type of reduction(s) I am attempting would be greatly appreciated.

Thanks again.

Robert_Crovella · September 6, 2019, 2:27pm

Here’s a sample code with cub reduce that shows use of a user-supplied reduction operator:

[url]CUB: cub::DeviceReduce Struct Reference

So make that operator multiplication.

To do this in the complex domain, the easiest way would be to include

#include <thrust/complex.h>

in your code. A bit of googling will show you how to use thrust::complex

rhaney · September 6, 2019, 2:35pm

Thanks again for all help - it is greatly appreciated.

Unfortunately I can’t use thrust.

The one caveat that I have with regard to reduction, is that the array element can’t be modified - I suppose I could make a copy of the array and employ it for reduction but that seems like a lot of extra overhead?

Robert_Crovella · September 6, 2019, 2:40pm

You can construct your own complex multiply operator without thrust. read about complex number arithmetic on wikipedia.

The reduction operation does not modify the array elements.

rhaney · September 6, 2019, 2:47pm

Cool. I guess I need to read up more before asking, thank you for being patient.

rhaney · September 6, 2019, 3:39pm

Quick question: Is there any problem with passing an array to CUB reduction that is already allocated on the DEVICE?

Thanks again.

Robert_Crovella · September 6, 2019, 3:46pm

That would be the only way to use cub.

rhaney · September 6, 2019, 3:50pm

Cool. Just making sure. Thank you

rhaney · September 6, 2019, 4:55pm

I got it working, thank you very much Robert_Crovella. I posted the final working version at:

[url]CUB reduction with complex number and multiplication - GPU-Accelerated Libraries - NVIDIA Developer Forums

Just in case others could make use of the solution.

Topic		Replies	Views
CUDA kernel for-loop performance CUDA Programming and Performance	16	6977	September 7, 2019
CUB reduction with complex number and multiplication GPU-Accelerated Libraries	6	2062	September 10, 2019
operation that returns a single value only? CUDA Programming and Performance	2	3194	September 2, 2009
CUDA CUB reduction with complex division GPU-Accelerated Libraries	9	1786	September 13, 2019
Reduction using CUDA CUB with complex numbers CUDA Programming and Performance	1	490	May 22, 2020
CUB Repost - I apologize GPU-Accelerated Libraries	0	333	May 26, 2020
Reduction questions(newbie-ish) CUDA Programming and Performance	7	1890	January 14, 2009
newb question parallel add array in cuda CUDA Programming and Performance	1	3903	October 28, 2008
operation that returns a single value only? CUDA Programming and Performance	2	3804	September 3, 2009
convert Matlab array multiplication and sum function to CUDA equivalent CUDA Programming and Performance	12	13288	August 17, 2010

Convert HOST Loop to CUDA

Related topics