Is it possible to perform an elementwise matrix product with CUBLAS? I mean, to obtain the Hadamard product of two matrices just as the BLAS SHAD routine, or Matlab’s ‘.*’ operator do.

SHAD is not included in CUBLAS.

It is a very simple kernel to write.

I know but I am quite new with CUDA. I am trying to multiply two 256x256 matrices (A times the identity I). My blocks are of size 1x256 and I therefore created a 256x1 grid. My code is as follows but it will just put zeros everywhere except on the last line.

xBlock=256 and and Block=1 are just the block sizes. Could someone please give me some advice? I think that I am missing some important concept about kernel programming. Thanks in advance:

**global** void shadKernel(float *x, float *y, float *z, int nL, int nC)

{

```
__shared__ float zS;
// Block index
//int bx = blockIdx.x;
int by = blockIdx.y;
```

// Thread index

```
int tx = threadIdx.x;
int ty = threadIdx.y;
```

//Index to the start of the block

```
int blockIni = by*nC*yBlock;
```

//Shared memory for the submatrices to compute within each block

```
__shared__ float xS[yBlock][xBlock];
__shared__ float yS[yBlock][xBlock];
```

//Load the matrices from global memory to shared memory

```
xS[ty][tx] = x[blockIni + xBlock * ty + tx];
yS[ty][tx] = y[blockIni + xBlock * ty + tx];
__syncthreads();
```

//Multiply submatrices

```
zS = xS[ty][tx]*yS[ty][tx];
__syncthreads();
```

//Write the result to global memory

```
z[blockIni + xBlock * ty + tx] = zS;
```

}