CUBLAS Hadamard Product SHAD

Is it possible to perform an elementwise matrix product with CUBLAS? I mean, to obtain the Hadamard product of two matrices just as the BLAS SHAD routine, or Matlab’s ‘.*’ operator do.

SHAD is not included in CUBLAS.
It is a very simple kernel to write.

I know but I am quite new with CUDA. I am trying to multiply two 256x256 matrices (A times the identity I). My blocks are of size 1x256 and I therefore created a 256x1 grid. My code is as follows but it will just put zeros everywhere except on the last line.

xBlock=256 and and Block=1 are just the block sizes. Could someone please give me some advice? I think that I am missing some important concept about kernel programming. Thanks in advance:

global void shadKernel(float *x, float *y, float *z, int nL, int nC)


__shared__ float zS;

// Block index

//int bx = blockIdx.x;

int by = blockIdx.y;

// Thread index

int tx = threadIdx.x;

int ty = threadIdx.y;

//Index to the start of the block

int blockIni = by*nC*yBlock;

//Shared memory for the submatrices to compute within each block

__shared__ float xS[yBlock][xBlock];

__shared__ float yS[yBlock][xBlock];

//Load the matrices from global memory to shared memory

xS[ty][tx] = x[blockIni + xBlock * ty + tx];

yS[ty][tx] = y[blockIni + xBlock * ty + tx];


//Multiply submatrices

zS = xS[ty][tx]*yS[ty][tx];


//Write the result to global memory

z[blockIni + xBlock * ty + tx] = zS;