mex function not Cuda kernel

theotheraussie · April 13, 2012, 3:53am

I have been trying to write a mexfunction that will call a cuda kernel. I have run into some difficulties that i have overcome as a result of many posts on this forum. So i thought i give it another go. I have managed to get the cuda kernel compiled and linked into the mexfunction, but it looks like the mexfunction doesn’t even call the kernel. I tried having a .cpp file as an entry point to the kernel and this doesn’t seem to be called either.

/*

 * This mexFunction is a test stub to call a Cuda kernel function from Matlab

 *

 * This mexfunction will call C code to add two vectors together.

 */

#include <stdio.h>

#include <math.h>

#include "mex.h"

#include <cuda.h>

#include <cuda_runtime.h>

#ifdef __cplusplus

extern "C" {

#endif

void VecAddCPP(int grid, int block, float *vector1, float* vector2, float* resultVector);

#ifdef __cplusplus

}

#endif

void mexFunction (int nlhs, 

                  mxArray *plhs[],

                  int nrhs, 

                  const mxArray *prhs[])

{

float *vector1, *vector2;

    float *resultVector;

    int row, col;

/* Check for proper number of arguments */

    if (nrhs != 2) { 

        mexErrMsgTxt("Two input arguments required."); 

    }

    if (nlhs != 1) {

        mexErrMsgTxt("One output arguments required."); 

    } 

    if (mxGetM(prhs[0]) != mxGetM(prhs[1]) && 

        mxGetN(prhs[0]) != mxGetN(prhs[1])){

        mexErrMsgTxt("Input vectors must be the same size.");

    }

/* get the two vectors */

    vector1 = (float *)mxGetPr(prhs[0]);

    vector2 = (float *)mxGetPr(prhs[1]);

    row = (int)mxGetM(prhs[0]);

    col = (int)mxGetN(prhs[0]);

	for (int i = 0; i < col; i++){

		mexPrintf("1:: Vect1: %g, vect2: %g\n",vector1[i],vector2[i]);

	}

/* put the input vectors on the GPU */

    float *device_vect1, *device_vect2, *device_result;

    cudaMemcpy(device_vect1, &vector1, col*sizeof(float), cudaMemcpyHostToDevice);

    cudaMemcpy(device_vect2, &vector2, col*sizeof(float), cudaMemcpyHostToDevice);

mexPrintf("row: %d, col: %d\n",row,col);

    /* assign the return vectror */

    plhs[0] = mxCreateDoubleMatrix(row, col, mxREAL);  /* result vector */

    resultVector = (float *)mxGetPr(plhs[0]);

    int grid = 1;

    int block = col;

	

    mexPrintf("calling vecAdd Cuda\n");

VecAddCPP(grid, block, device_vect1, device_vect2, device_result);

//	VecAddKernelEmulation(grid, block, vector1, vector2, resultVector);

    mexPrintf("returned from Cuda\n");

/* get the resulting vector off the GPU */

    cudaMemcpy(&resultVector, device_result, col*sizeof(float), cudaMemcpyDeviceToHost);

	mexPrintf("\nResults from GPU\n");

	for (int i = 0; i < col; i++){

		mexPrintf("5:: Vect1: %g, vect2: %g, result: %g\n",vector1[i],vector2[i], resultVector[i]);

	}

/* free the device memory */

    cudaFree(device_vect1);

    cudaFree(device_vect2);

    cudaFree(device_result);

}

Boxed_Cylon · April 14, 2012, 8:50pm

The question is a little out of my expertise - however I note that there seems to be some confusion in your code concerning single/double precision. “float” is single, I believe (not being sarcastic, I just usually program in fortran…) The variables sent to the mex file should be single precision (single(var)) and the return value from your kernel should be single (or both double). The variables sent to the mex file should then be checked there for the correct precision. Perhaps this helps…

tbenson · April 23, 2012, 1:24am

I have been trying to write a mexfunction that will call a cuda kernel. I have run into some difficulties that i have overcome as a result of many posts on this forum. So i thought i give it another go. I have managed to get the cuda kernel compiled and linked into the mexfunction, but it looks like the mexfunction doesn’t even call the kernel. I tried having a .cpp file as an entry point to the kernel and this doesn’t seem to be called either.
/* put the input vectors on the GPU */

    float *device_vect1, *device_vect2, *device_result;

    cudaMemcpy(device_vect1, &vector1, col*sizeof(float), cudaMemcpyHostToDevice);

    cudaMemcpy(device_vect2, &vector2, col*sizeof(float), cudaMemcpyHostToDevice);

Are you checking for error codes after your kernel invocation? Above, you cudaMemcpy to some pointers for which there is no allocated device memory. You should presumably have some cudaMalloc() calls in there before the cudaMemcpy() calls. If you check the return codes on cudaMemcpy(), then I expect it is returning an error. The kernel is probably then later returning a launch failure error due to the device memory not having been allocated.

Topic		Replies	Views
mex function not Cuda kernel CUDA Programming and Performance	3	1222	April 16, 2012
Cuda from Mexfunction calling Cuda kernel from Matlab mexfunction CUDA Programming and Performance	2	2192	April 11, 2012
cudaMemcpy Failing To Copy Variable From Device To Host Correctly CUDA Programming and Performance	3	2751	April 26, 2021
Cuda giving wrong result CUDA Programming and Performance cuda , kernel	2	643	May 3, 2020
Help me Cuda on Matlab CUDA Programming and Performance	1	1211	August 1, 2010
Calling CUDA functions from a C file CUDA Programming and Performance	19	29117	March 4, 2015
Embarassingly beginner question CUDA Programming and Performance	8	3289	May 22, 2009
Do computation not in a cuda kernel function but in mex file containning cuda code directly? CUDA Programming and Performance	1	533	November 10, 2015
linking of cuda kernels via cmake segfaults upon execution Jetson TX2	4	840	October 18, 2021
Small question about function call CUDA Programming and Performance cuda	4	349	April 8, 2020

mex function not Cuda kernel

Related topics