 Is it possible to perform an elementwise matrix product with CUBLAS? I mean, to obtain the Hadamard product of two matrices just as the BLAS SHAD routine, or Matlab’s ‘.*’ operator do.

SHAD is not included in CUBLAS.
It is a very simple kernel to write.

I know but I am quite new with CUDA. I am trying to multiply two 256x256 matrices (A times the identity I). My blocks are of size 1x256 and I therefore created a 256x1 grid. My code is as follows but it will just put zeros everywhere except on the last line.

xBlock=256 and and Block=1 are just the block sizes. Could someone please give me some advice? I think that I am missing some important concept about kernel programming. Thanks in advance:

global void shadKernel(float *x, float *y, float *z, int nL, int nC)

{

``````__shared__ float zS;

// Block index

//int bx = blockIdx.x;

int by = blockIdx.y;
``````

``````int tx = threadIdx.x;

``````

//Index to the start of the block

``````int blockIni = by*nC*yBlock;
``````

//Shared memory for the submatrices to compute within each block

``````__shared__ float xS[yBlock][xBlock];

__shared__ float yS[yBlock][xBlock];
``````

//Load the matrices from global memory to shared memory

``````xS[ty][tx] = x[blockIni + xBlock * ty + tx];

yS[ty][tx] = y[blockIni + xBlock * ty + tx];

``````

//Multiply submatrices

``````zS = xS[ty][tx]*yS[ty][tx];

``````z[blockIni + xBlock * ty + tx] = zS;