A biginner's CUDA.net problem

Hi all,
I just wrote a simple code to use CUDA.net do a vector summation but somehow it doesn’t give me correct result.Below is the kernel code and host code, I really can’t find anything wrong. Could somebody help me to point out where is wrong?

Test.cu:

/* Add two vectors on the GPU */
extern “C” global void vectorAddGPU(float *a, float *b, float c, int N)
{
int idx = blockIdx.x
blockDim.x + threadIdx.x;
if (idx < N)
c[idx] = a[idx] + b[idx];
}

Program.cs:
CUDA cuda = new CUDA(0, true);
string s = Path.Combine(Environment.CurrentDirectory ,“Test.ptx”);
CUfunction func;
try
{
cuda.LoadModule(s);
func = cuda.GetModuleFunction(“vectorAddGPU”);
}
catch (CUDAException e)
{
Console.WriteLine(e);
return;
}
float a = new float[1 << 10];
for (int i = 0; i < a.Length; i++)
a[i] = i;

        float[] b = new float[1 << 10];
        for (int i = 0; i < b.Length; i++)
            b[i] = 2 * i + 1;
        float[] c = new float[1 << 10];
        float[] c1 = new float[1 << 10];            
            
        CUdeviceptr d_a = cuda.CopyHostToDevice<float>(a);            

        CUdeviceptr d_b = cuda.CopyHostToDevice<float>(b);            
                    
        int N = 1<<10;

        for (int i = 0; i < N; i++)
            c1[i] = -1;
        CUdeviceptr d_c = cuda.CopyHostToDevice<float>(c1);
                    
        try
        {
            cuda.SetParameter(func, 0, (uint)d_a.Pointer);
            cuda.SetParameter(func, IntPtr.Size, (uint)d_b.Pointer);
            cuda.SetParameter(func, IntPtr.Size * 2, (uint)d_c.Pointer);
            cuda.SetParameter(func, IntPtr.Size * 3, (uint)N);
            cuda.SetParameterSize(func, (uint)(IntPtr.Size * 3 + sizeof(int)));
            cuda.SetFunctionBlockShape(func, 1<<10, 1, 1);
            cuda.Launch(func, 1, 1);
            cuda.CopyDeviceToHost<float>(d_c, c1);
        }
        catch (CUDAException e)
        {
            Console.WriteLine(e);
            return;
        }            

        for (int i = 0; i < 1 << 10; i++)
        {
            c[i] = a[i] + b[i];
            if (c1[i] != c[i])
                Console.WriteLine("not OK\n");
        }

        cuda.Free(d_a);
        cuda.Free(d_b);
        cuda.Free(d_c);//*/

In the code I initialize the d_c (point on device) to be -1. But after launching the kernel, it still hold the value -1. I really can’t figure out where is wrong. Please help me. Thanks.