convert Matlab array multiplication and sum function to CUDA equivalent

vivekv80 · August 16, 2010, 4:54am

I am trying to convert this Matlab code into its CUDA equivalent:

[codebox]isize = 20;

n = 7;

for i = 1:n %%7x7 xcorr

for j = 1:n

xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array

end

end[/codebox]

Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??

erike · August 16, 2010, 5:19am

I am trying to convert this Matlab code into its CUDA equivalent:

[codebox]isize = 20;

n = 7;

for i = 1:n %%7x7 xcorr

for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end

end[/codebox]

Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??

sum(sum(array)) should just be a simple reduction.

You can do it yourself by using the code in the Parallel Reduction example @

http://developer.download.nvidia.com/compu…Algorithms.html

Or you can use CUDPP @

http://code.google.com/p/cudpp/

An example of calculating the sum of an array can be found @

http://cudpp.googlecode.com/svn/tags/1.1.1…implecudpp.html

Or you could use Thrust @

http://code.google.com/p/thrust/

The second example on the main page will show you how to do it.

erike · August 16, 2010, 5:19am

I am trying to convert this Matlab code into its CUDA equivalent:

[codebox]isize = 20;

n = 7;

for i = 1:n %%7x7 xcorr

for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end

end[/codebox]

Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??

sum(sum(array)) should just be a simple reduction.

You can do it yourself by using the code in the Parallel Reduction example @

http://developer.download.nvidia.com/compu…Algorithms.html

Or you can use CUDPP @

http://code.google.com/p/cudpp/

An example of calculating the sum of an array can be found @

http://cudpp.googlecode.com/svn/tags/1.1.1…implecudpp.html

Or you could use Thrust @

http://code.google.com/p/thrust/

The second example on the main page will show you how to do it.

vivekv80 · August 16, 2010, 7:25pm

I am new to Thrust and after going thrugh the examples tried this:

for(i = 0; i < npix*npix; ++i)

	{

		thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());

	}

	

	for(i = 0; i < npix*npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);

	}

But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?

vivekv80 · August 16, 2010, 7:25pm

I am new to Thrust and after going thrugh the examples tried this:

for(i = 0; i < npix*npix; ++i)

	{

		thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());

	}

	

	for(i = 0; i < npix*npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);

	}

But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?

erike · August 16, 2010, 8:52pm

I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)

	{

		thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());

	}

	

	for(i = 0; i < npix*npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);

	}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?

Is the result after the multiplication wrong as well? Because I think the problem might be your indexing at your second loop. Can you try:

for(i = 0; i < npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i*npix, vec_pp.begin()+(i+1)*npix-1);

	}

erike · August 16, 2010, 8:52pm

I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)

	{

		thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());

	}

	

	for(i = 0; i < npix*npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);

	}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?

Is the result after the multiplication wrong as well? Because I think the problem might be your indexing at your second loop. Can you try:

for(i = 0; i < npix; ++i)

	{

		vec_sum[i] = thrust::reduce(vec_pp.begin()+i*npix, vec_pp.begin()+(i+1)*npix-1);

	}

vivekv80 · August 16, 2010, 9:06pm

the answers after the multiplication is wrong.

As per my understanding, I need to multiply the 400 values of ref_d with 400 values of ffcorr1. But ref_d has 676 values, so I want to shift the values of ref_d from 0 to 400, 7 to 407, etc. for each multiplication.

Should I be using a permuatation_iterator ??

vivekv80 · August 16, 2010, 9:06pm

the answers after the multiplication is wrong.

As per my understanding, I need to multiply the 400 values of ref_d with 400 values of ffcorr1. But ref_d has 676 values, so I want to shift the values of ref_d from 0 to 400, 7 to 407, etc. for each multiplication.

Should I be using a permuatation_iterator ??

melonakos · August 17, 2010, 12:11pm

Hello,

With Jacket (http://www.accelereyes.com), you can get a CUDA version of this directly in M, as follows:

[codebox]

isize = 20;

n = 7;

ffcorr1 = gdouble(ffcorr1); ref = gdouble(ref); % only need to add one line of code

for i = 1:n %%7x7 xcorr

for j = 1:n

xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array

end

[/codebox]

You only need to add one line of code and you’re done. You might also play with GFOR (http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage) to accelerate and auto-vectorize the inner loop. We’re happy to help you if you have any questions: support@accelereyes.com

Best,

John

melonakos · August 17, 2010, 12:11pm

Hello,

With Jacket (http://www.accelereyes.com), you can get a CUDA version of this directly in M, as follows:

[codebox]

isize = 20;

n = 7;

ffcorr1 = gdouble(ffcorr1); ref = gdouble(ref); % only need to add one line of code

for i = 1:n %%7x7 xcorr

for j = 1:n

xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array

end

[/codebox]

You only need to add one line of code and you’re done. You might also play with GFOR (http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage) to accelerate and auto-vectorize the inner loop. We’re happy to help you if you have any questions: support@accelereyes.com

Best,

John

vivekv80 · August 17, 2010, 9:41pm

Matlab is not installed on a machine with GPU. So I will have to rely on Thrust or CUDPP for getting this implemented.

vivekv80 · August 17, 2010, 9:41pm

Matlab is not installed on a machine with GPU. So I will have to rely on Thrust or CUDPP for getting this implemented.

Topic		Replies	Views
Array Sum in cuda CUDA Programming and Performance	5	11520	May 30, 2010
sum of all elements of a matrix CUDA Programming and Performance	11	36619	October 18, 2010
CUDA - calculation of a sum CUDA Programming and Performance	7	5604	April 30, 2010
Summing a 2-d array across one dimension CUDA Programming and Performance	2	3077	September 10, 2008
Combining sums CUDA Programming and Performance	1	1240	November 27, 2008
newb question parallel add array in cuda CUDA Programming and Performance	1	3875	October 28, 2008
Add Rows of a Matrix Matrix row addition incredibly slow... CUDA Programming and Performance	3	4410	July 22, 2010
Easyway to compute the sum of the array? CUDA Programming and Performance	4	8052	February 13, 2008
Summing matrix elements CUDA Programming and Performance	3	6963	July 4, 2011
How to sum all the elements of an array CUDA Programming and Performance	4	30542	April 6, 2011

convert Matlab array multiplication and sum function to CUDA equivalent

Related topics