Cuda from Mexfunction calling Cuda kernel from Matlab mexfunction

A little background:

I am trying to call a simple vector addition function that is calulated on the GPU, but called from matlab. I used the matlab feval before, but this does not provide enough freedom to distribute memory properly for the real task. The big problem is, we need to be able to distribute the data on to shared and local memory to speed things up. The only way i have found that would be able to do this is using the mexfunction interface to call the cuda kernel. This would allow us to distribute the memory where we need it (hopefully). If anyone has another solution to this i would greatly appreciate it.

However, the problem i’m having with trying to implement the mex interface solution is that i cant get the kernel and the mexfunction to compile and link properly. I’m not entirely sure on the commands i need to use. When i try to compile the mex function and link in the kernel, the kernel function or the syntax, (kernel<<<grid,block>>>(a,b,c) to call the kernel isn’t recognized.

I’m using Matlab 2011b with all the toolboxes.

VecAdd.cu

__global__ void VecAdd(double *vector1, double* vector2, double* resultVector) 

{ 

    int idx = threadIdx.x;

    resultVector[idx] = vector1[idx] + vector2[idx]; 

}

VecAddMexFunction.cpp

/*

 * This mexFunction is a test stub to call a Cuda kernel function from Matlab

 *

 * This mexfunction will call C code to add two vectors together.

 */

#include <stdio.h>

#include "mex.h"

#include "cuda.h"

#include "cuda_runtime.h"

extern void vecAdd(double *vector1, double* vector2, int vectSize, double* resultVector);

void mexFunction (int nlhs, 

                  mxArray *plhs[],

                  int nrhs, 

                  const mxArray *prhs[])

{

double *vector1, *vector2;

    double *resultVector;

    int row, col;

/* Check for proper number of arguments */

    if (nrhs != 2) { 

        mexErrMsgTxt("Two input arguments required."); 

    }

    if (nlhs != 1) {

        mexErrMsgTxt("One output arguments required."); 

    } 

    if (mxGetM(prhs[0]) != mxGetM(prhs[1]) || 

        mxGetN(prhs[0]) != mxGetN(prhs[1])){

        mexErrMsgTxt("Input vectors must be the same size.");

    }

/* get the two vectors */

    vector1 = mxGetPr(prhs[0]);

    vector2 = mxGetPr(prhs[1]);

    row = (int)mxGetM(prhs[0]);

    col = (int)mxGetN(prhs[0]);

/* put the input vectors on the GPU */

    double *device_vect1, *device_vect2, *device_result;

    cudaMemcpy(device_vect1, &vector1, col*sizeof(double), cudaMemcpyHostToDevice);

    cudaMemcpy(device_vect2, &vector2, col*sizeof(double), cudaMemcpyHostToDevice);

mexPrintf("row: %d, col: %d\n",row,col);

    /* assign the return vectror */

    plhs[0] = mxCreateDoubleMatrix(row, col, mxREAL);  /* result vector */

    resultVector = mxGetPr(plhs[0]);

mexPrintf("calling vecAdd Cuda\n");

    VecAdd<<<1,col>>>(device_vect1, device_vect2, device_result);

    mexPrintf("returned from Cuda\n");

/* get the resulting vector of the GPU */

    cudaMemcpy(&resultVector, device_result, col*sizeof(double), cudaMemcpyDeviceToHost);

/* free the device memory */

    cudaFree(device_vect1);

    cudaFree(device_vect2);

    cudaFree(device_result);

}

It should be noted that i’m compileing this on the Matlab command line.

compiling kernel:

!nvcc -c -arch=sm_13 …/ISARLab-Dev/Cuda_Code/VecAdd.cu

VecAdd.cu

tmpxft_00001438_00000000-3_VecAdd.cudafe1.gpu

tmpxft_00001438_00000000-8_VecAdd.cudafe2.gpu

VecAdd.cu

tmpxft_00001438_00000000-3_VecAdd.cudafe1.cpp

tmpxft_00001438_00000000-14_VecAdd.ii

compiling and linking mexfunction:

mex -I"C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.1/include/" …/ISARLab-Dev/Mex_Files/VectAddMexFunction.cpp VecAdd.obj

VectAddMexFunction.cpp

…\ISARLab-Dev\Mex_Files\VectAddMexFunction.cpp(54) : error C2065: ‘VecAdd’ : undeclared identifier

…\ISARLab-Dev\Mex_Files\VectAddMexFunction.cpp(54) : error C2059: syntax error : ‘<’

C:\PROGRA~1\MATLAB\R2011B\BIN\MEX.PL: Error: Compile of ‘…\ISARLab-Dev\Mex_Files\VectAddMexFunction.cpp’ failed.

Error using mex (line 206)

Unable to complete successfully.

I’ve had a look at a number of other posts, which suggest using nvmex does the linking for you, but i’ve downloaded the files from http://www.cs.ucf.edu/~janaka/gpu/using_nvmex.htm to try it this way and i still get the same error messages with an additional unknown architecture warning that i cant seem to fix.

So my questions are:

Am i getting the compilation + linking commands wrong? am i missing some step to accomplish this? is there something else that i could be missing, like an include directory or some files somewhere?

Thanks for the help in advance.

Lot’s of other posts in these forums may help. Here is a recent one where mfatica has a good suggestion to fix the compiler errors by foregoing the nvmex scripts and compiling directly, and where I invite you to get a better experience all together with the Jacket SDK. Good luck!

I tried what was suggested in the post, or something similar.

!nvcc -c -arch=sm_13 …/ISARLab-Dev/Cuda_Code/VecAdd.cu -Xcompiler -fPIC -I"C:/Program Files/MATLAB/R2011b/extern/include"

cl : Command line warning D9002 : ignoring unknown option ‘-fPIC’

I know i need the -fPIC option to work, but to see if it would compile without this, i removed it.

!nvcc -c -arch=sm_13 …/ISARLab-Dev/Cuda_Code/VecAdd.cu -Xcompiler -I"C:/Program Files/MATLAB/R2011b/extern/include"

nvcc fatal : Don’t know what to do with ‘Files/MATLAB/R2011b/extern/include’

I’m not exactely sure on what files are intended to be inlcuded from the original line -I /usr/local/matlab/extern/include, but this directory does not exist for me and i believe the one i substituted is the closest i could find. I thought -I"My directory" would work for directories with spaces? does anyone know the proper way to include directories with spaces?