CUDA kernel from matlab CUDA kernel for matrix operations from matlab

abhi.shek · May 11, 2012, 7:38pm

Hi,

I am using the compilation utility for running the CUDA kernel from Matlab.

I am implementing the matrix multiplication for brushing up my CUDA skills.

Here is the code that I am using.

#include "cuda.h"

#include "mex.h"

__global__ void mmkernel(float *a, float *b, float *c, int N, int M, int K)

{

   int i = blockIdx.x * 64 + threadIdx.x;

  int j = blockIdx.y;

  int tx = threadIdx.x;

  __shared__ float cb[32];

  float sum0 = 0.0f, sum1=0.0f;

  for (int ks = 0; ks < M; ks+= 32) {

    cb[tx] = b[ks+tx+M*j];

    __syncthreads();

    for (int k = ks; k< ks+32; k++) {

        sum0 += a[i+N*k] * cb[k-ks];

        sum1 += a[i+32+N*k] * cb[k-ks];

    }

    __syncthreads();

  }

  c [i+N*j] = sum0;

  c[i+32+N*j] = sum1;

}

void mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])

{

        int i,j,m,n,o,p;

        double *data1,*data2,*m_ans;

        float *h1,*h2,*h3;

        float *d_data1,*d_data2,*d_ans;

m = mxGetM(prhs[0]);

        n = mxGetN(prhs[0]);

o = mxGetM(prhs[1]);

        p = mxGetN(prhs[1]);

        plhs[0] = mxCreateDoubleMatrix(m,p,mxREAL);

        data1 = mxGetPr(prhs[0]);

        data2 = mxGetPr(prhs[1]);

        h1 = (float*) mxMalloc(sizeof(float)*m*n);

        h2 = (float*) mxMalloc(sizeof(float)*o*p);

        for(i=0;i<m*n;i++)

        {

                h1[i] = float(data1[i]);

        }

        for(j=0;j< o*p;j++)

        {

                h2[i] = float(data2[j]);

        }

h3 = (float*)mxMalloc(sizeof(float)*m*p);

cudaMalloc((void**)&d_data1,sizeof(float)*m*n);

        cudaMalloc((void**)&d_data2,sizeof(float)*o*p);

        cudaMalloc((void**)&d_ans,sizeof(float)*m*p);

        cudaMemcpy(d_data1,h1,sizeof(float)*m*n,cudaMemcpyHostToDevice);

        cudaMemcpy(d_data2,h2,sizeof(float)*o*p,cudaMemcpyHostToDevice);

        dim3 dimBlock(32);

        dim3 dimGrid(n/64,n);

        mmkernel<<<dimGrid,dimBlock>>>(d_data1,d_data2,d_ans,m,n,p);

        cudaMemcpy(h3,d_ans,sizeof(float)*m*p,cudaMemcpyDeviceToHost);

        m_ans = mxGetPr(plhs[0]);

        for(j = 0;j<m*p;j++)

        {

                m_ans[j] = (double)h3[j];

        }

        mxFree(h1);

        mxFree(h2);

        mxFree(h3);

        cudaFree(d_data1);

        cudaFree(d_data2);

        cudaFree(d_ans);

}

After passing the two matrices as

f = ones(512,512) and g = ones(512,512)

to h = matmul(f,g) which is the resultant CUDA kernel for mex, I am getting all values of h as 0.

Can anyone help me by providing me hints, as to where am I going wrong?

Thanks in advance.

Cheers.

Jimmy_Pettersson · May 12, 2012, 7:58am

Perhaps you could begin by simplifying your kernel to the point where you are certain that you know that it’s working correctly. Doing a simple matrix addition: output = data1 + data2 should give you the output=2*ones(512,512) in your abov example.

This way you can deduct wheter it’s an error in the host code or the device code.

Topic		Replies	Views
Help me Cuda on Matlab CUDA Programming and Performance	0	2556	July 29, 2010
Adding Huge Matlab Matrices - Doesn't Work Sometimes CUDA Programming and Performance	1	2672	March 10, 2009
Help me Cuda on Matlab CUDA Programming and Performance	1	1211	August 1, 2010
Hello CUDA! program not working - please help CUDA Programming and Performance	2	1224	February 17, 2010
Matlab, Mex files & CUDA CUDA Programming and Performance	3	1052	February 25, 2012
plain matrix multiplication very slow (implemented with Matlab mex) CUDA Programming and Performance	4	1652	August 1, 2016
Multiple calls of one CUDA function - Matlab CUDA Programming and Performance	9	5014	March 4, 2009
matrix multiplication with its transpose in cuda(cudamemcpy from device to host not working) . CUDA Programming and Performance	6	1757	October 5, 2018
Matrix Calculations/Manipulations CUDA Programming and Performance	1	405	March 20, 2017
Cuda from Mexfunction calling Cuda kernel from Matlab mexfunction CUDA Programming and Performance	2	2192	April 11, 2012

CUDA kernel from matlab CUDA kernel for matrix operations from matlab

Related topics