Hi all,
I just wrote a simple code to use CUDA.net do a vector summation but somehow it doesn’t give me correct result.Below is the kernel code and host code, I really can’t find anything wrong. Could somebody help me to point out where is wrong?
Test.cu:
/* Add two vectors on the GPU */
extern “C” global void vectorAddGPU(float *a, float *b, float c, int N)
{
int idx = blockIdx.xblockDim.x + threadIdx.x;
if (idx < N)
c[idx] = a[idx] + b[idx];
}
Program.cs:
CUDA cuda = new CUDA(0, true);
string s = Path.Combine(Environment.CurrentDirectory ,“Test.ptx”);
CUfunction func;
try
{
cuda.LoadModule(s);
func = cuda.GetModuleFunction(“vectorAddGPU”);
}
catch (CUDAException e)
{
Console.WriteLine(e);
return;
}
float a = new float[1 << 10];
for (int i = 0; i < a.Length; i++)
a[i] = i;
float[] b = new float[1 << 10];
for (int i = 0; i < b.Length; i++)
b[i] = 2 * i + 1;
float[] c = new float[1 << 10];
float[] c1 = new float[1 << 10];
CUdeviceptr d_a = cuda.CopyHostToDevice<float>(a);
CUdeviceptr d_b = cuda.CopyHostToDevice<float>(b);
int N = 1<<10;
for (int i = 0; i < N; i++)
c1[i] = -1;
CUdeviceptr d_c = cuda.CopyHostToDevice<float>(c1);
try
{
cuda.SetParameter(func, 0, (uint)d_a.Pointer);
cuda.SetParameter(func, IntPtr.Size, (uint)d_b.Pointer);
cuda.SetParameter(func, IntPtr.Size * 2, (uint)d_c.Pointer);
cuda.SetParameter(func, IntPtr.Size * 3, (uint)N);
cuda.SetParameterSize(func, (uint)(IntPtr.Size * 3 + sizeof(int)));
cuda.SetFunctionBlockShape(func, 1<<10, 1, 1);
cuda.Launch(func, 1, 1);
cuda.CopyDeviceToHost<float>(d_c, c1);
}
catch (CUDAException e)
{
Console.WriteLine(e);
return;
}
for (int i = 0; i < 1 << 10; i++)
{
c[i] = a[i] + b[i];
if (c1[i] != c[i])
Console.WriteLine("not OK\n");
}
cuda.Free(d_a);
cuda.Free(d_b);
cuda.Free(d_c);//*/
In the code I initialize the d_c (point on device) to be -1. But after launching the kernel, it still hold the value -1. I really can’t figure out where is wrong. Please help me. Thanks.