I am trying to convert this Matlab code into its CUDA equivalent:
[codebox]isize = 20;
n = 7;
for i = 1:n %%7x7 xcorr
for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end
end[/codebox]
Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??
erike
August 16, 2010, 5:19am
#2
I am trying to convert this Matlab code into its CUDA equivalent:
[codebox]isize = 20;
n = 7;
for i = 1:n %%7x7 xcorr
for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end
end[/codebox]
Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??
sum(sum(array)) should just be a simple reduction.
You can do it yourself by using the code in the Parallel Reduction example @
http://developer.download.nvidia.com/compu…Algorithms.html
Or you can use CUDPP @
http://code.google.com/p/cudpp/
An example of calculating the sum of an array can be found @
http://cudpp.googlecode.com/svn/tags/1.1.1…implecudpp.html
Or you could use Thrust @
http://code.google.com/p/thrust/
The second example on the main page will show you how to do it.
erike
August 16, 2010, 5:19am
#3
I am trying to convert this Matlab code into its CUDA equivalent:
[codebox]isize = 20;
n = 7;
for i = 1:n %%7x7 xcorr
for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end
end[/codebox]
Can someone point me how the sum(sum(array)) can be implemented in CUDA?? What is the correct way to write this kernel in CUDA ??
sum(sum(array)) should just be a simple reduction.
You can do it yourself by using the code in the Parallel Reduction example @
http://developer.download.nvidia.com/compu…Algorithms.html
Or you can use CUDPP @
http://code.google.com/p/cudpp/
An example of calculating the sum of an array can be found @
http://cudpp.googlecode.com/svn/tags/1.1.1…implecudpp.html
Or you could use Thrust @
http://code.google.com/p/thrust/
The second example on the main page will show you how to do it.
I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)
{
thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());
}
for(i = 0; i < npix*npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);
}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?
I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)
{
thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());
}
for(i = 0; i < npix*npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);
}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?
erike
August 16, 2010, 8:52pm
#6
I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)
{
thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());
}
for(i = 0; i < npix*npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);
}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?
Is the result after the multiplication wrong as well? Because I think the problem might be your indexing at your second loop. Can you try:
for(i = 0; i < npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i*npix, vec_pp.begin()+(i+1)*npix-1);
}
erike
August 16, 2010, 8:52pm
#7
I am new to Thrust and after going thrugh the examples tried this:
for(i = 0; i < npix*npix; ++i)
{
thrust::transform(ref_d.begin()+i, ref_d.end()+i, ffcorr1.begin(), vec_pp.begin(), thrust::multiplies<double>());
}
for(i = 0; i < npix*npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i, vec_pp.end()+i);
}
But this does not produce correct results. When i = 0, the answer is right. Can the thrust::transform be modified to model this behavior?
Is the result after the multiplication wrong as well? Because I think the problem might be your indexing at your second loop. Can you try:
for(i = 0; i < npix; ++i)
{
vec_sum[i] = thrust::reduce(vec_pp.begin()+i*npix, vec_pp.begin()+(i+1)*npix-1);
}
the answers after the multiplication is wrong.
As per my understanding, I need to multiply the 400 values of ref_d with 400 values of ffcorr1. But ref_d has 676 values, so I want to shift the values of ref_d from 0 to 400, 7 to 407, etc. for each multiplication.
Should I be using a permuatation_iterator ??
the answers after the multiplication is wrong.
As per my understanding, I need to multiply the 400 values of ref_d with 400 values of ffcorr1. But ref_d has 676 values, so I want to shift the values of ref_d from 0 to 400, 7 to 407, etc. for each multiplication.
Should I be using a permuatation_iterator ??
Hello,
With Jacket (http://www.accelereyes.com ), you can get a CUDA version of this directly in M, as follows:
[codebox]
isize = 20;
n = 7;
ffcorr1 = gdouble(ffcorr1); ref = gdouble(ref); % only need to add one line of code
for i = 1:n %%7x7 xcorr
for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end
end
[/codebox]
You only need to add one line of code and you’re done. You might also play with GFOR (http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage ) to accelerate and auto-vectorize the inner loop. We’re happy to help you if you have any questions: support@accelereyes.com
Best,
John
Hello,
With Jacket (http://www.accelereyes.com ), you can get a CUDA version of this directly in M, as follows:
[codebox]
isize = 20;
n = 7;
ffcorr1 = gdouble(ffcorr1); ref = gdouble(ref); % only need to add one line of code
for i = 1:n %%7x7 xcorr
for j = 1:n
xcout(i,j) = sum(sum(ffcorr1 .* ref(i:i+isize-1,j:j+isize-1))); %%ref is 676 element array and ffcorr1 is a 400 element array
end
end
[/codebox]
You only need to add one line of code and you’re done. You might also play with GFOR (http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage ) to accelerate and auto-vectorize the inner loop. We’re happy to help you if you have any questions: support@accelereyes.com
Best,
John
Matlab is not installed on a machine with GPU. So I will have to rely on Thrust or CUDPP for getting this implemented.
Matlab is not installed on a machine with GPU. So I will have to rely on Thrust or CUDPP for getting this implemented.