getting wrong results when calling cublas in coupling with C++/CLI and C#

I have written a wrapper in C++11/CLI with Visual Studio to use CUDA’s CuBLAS. I am using CUDA Toolkit 7.0.

Here is the source code of my wrapper:

#pragma once

#include "stdafx.h"
#include "BLAS.h"
#include "cuBLAS.h"

namespace lab
    namespace Mathematics
	    namespace CUDA
		    void BLAS::DAXPY(int n, double alpha, const array<double> ^x, int incx, array<double> ^y, int incy)
			    pin_ptr<double> xPtr = &(x[0]);
				pin_ptr<double> yPtr = &(y[0]);
     			pin_ptr<double> alphaPtr = α

		    	cuBLAS::DAXPY(n, alphaPtr, xPtr, incx, yPtr, incy);

To test this code, I wrote the following test in C#:

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Linq;
using lab.Mathematics.CUDA;

namespace lab.Mathematics.CUDA.Test
  public class TestBLAS
    public void TestDAXPY()
        var count = 10;
        var alpha = 1.0;
        var a = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();
        var b = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();

        // Call CUDA
        BLAS.DAXPY(count, alpha, a, 1, b, 1);

        // Validate results
        for (int i = 0; i < count; i++)
            Assert.AreEqual(i + i, b[i]);

The program compiles with x64 architecture with no error. But the results I get are different every time I run the test. More precisely, the array b is the result and it has different values every time. And I don’t know why.

I am Also adding my cuda code maybe there, someone can find a problem. note that I don’t get any error, warning whatsoever while compiling. I am also wondering maybe I have to do some changes in the compilation while I did nothing and used the default options.

void cuBLAS::DAXPY(int n, const double *alpha, const double *x, int incx, double *y, int incy)
			// Allocate GPU memory
			double *devX, *devY;
			cudaMalloc((void **)&devX, (size_t)n*sizeof(*devX));
			cudaMalloc((void **)&devY, (size_t)n*sizeof(*devY));

			// Create cuBLAS handle
			cublasHandle_t handle;

			// Initialize the input matrix and vector
			cublasSetVector(n, sizeof(*devX), x, incx, devX, incx);

			// Call cuBLAS function
			cublasDaxpy(handle, n, alpha, devX, incx, devY, incy);

			// Retrieve resulting vector
			cublasGetVector(n, sizeof(*devY), devY, incy, y, incy);

			// Free GPU resources

Hi afshiinzkh,

This is Nsight visual studio forum, for cuda programming question you can ask it at CUDA Programming and Performance forum, for cublas queston you can ask it at GPU-Accelerated Libraries forum.

Best Regards