Do computation not in a cuda kernel function but in mex file containning cuda code directly?

Hi, I created a .cu file and want to run as a mexFunction containing cuda code. Do I have to write the digital computation part by invoking a cuda kernel function? Can I directly do it in the mexFunction, like the for loop in the following code. It looks to me the integer “i” was not created in the gpu… how to perform this nicely?

#include “mex.h”
#include “gpu/mxGPUArray.h”
#include “cuda.h”

void mexFunction(int nlhs, mxArray *plhs,
int nrhs, mxArray const *prhs)
{
double *data1, *Gr0;
double *gpuID;
int m,n;

m=mxGetM(prhs[0]);
n=mxGetN(prhs[0]);

plhs[0]=mxCreateDoubleMatrix(m,n,mxREAL);

data1=mxGetPr(prhs[0]);
Gr0=mxGetPr(plhs[0]);

gpuID=mxGetPr(prhs[1]);
int device=gpuID[0];
const size_t buf_size = m * n  * sizeof(double);
cudaSetDevice(device);
double *Gs0, *data2;
int i;
cudaMalloc(&Gs0, buf_size);
cudaMalloc(&data2, buf_size);

cudaMemcpy(Gs0, data1, buf_size, cudaMemcpyDefault);


for (i=0; i<3*3; ++i){
data2[i]=2*Gs0[i];
}
cudaMemcpy(Gr0, data2, buf_size, cudaMemcpyDefault);

}

A mexfunction runs on the host. If you put CUDA code (e.g. a kernel) in a mexfunction, you can make that portion of the function run on the GPU. Since your mexfunction has no CUDA kernels in it, none of it will run on the GPU.

http://www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html