Is it possible to perform an elementwise matrix product with CUBLAS? I mean, to obtain the Hadamard product of two matrices just as the BLAS SHAD routine, or Matlab’s ‘.*’ operator do.
SHAD is not included in CUBLAS.
It is a very simple kernel to write.
I know but I am quite new with CUDA. I am trying to multiply two 256x256 matrices (A times the identity I). My blocks are of size 1x256 and I therefore created a 256x1 grid. My code is as follows but it will just put zeros everywhere except on the last line.
xBlock=256 and and Block=1 are just the block sizes. Could someone please give me some advice? I think that I am missing some important concept about kernel programming. Thanks in advance:
global void shadKernel(float *x, float *y, float *z, int nL, int nC)
__shared__ float zS; // Block index //int bx = blockIdx.x; int by = blockIdx.y;
// Thread index
int tx = threadIdx.x; int ty = threadIdx.y;
//Index to the start of the block
int blockIni = by*nC*yBlock;
//Shared memory for the submatrices to compute within each block
__shared__ float xS[yBlock][xBlock]; __shared__ float yS[yBlock][xBlock];
//Load the matrices from global memory to shared memory
xS[ty][tx] = x[blockIni + xBlock * ty + tx]; yS[ty][tx] = y[blockIni + xBlock * ty + tx]; __syncthreads();
zS = xS[ty][tx]*yS[ty][tx]; __syncthreads();
//Write the result to global memory
z[blockIni + xBlock * ty + tx] = zS;