Hello all!
I’m new in cuda.
I change the matrixMul example project to another way of algoritem.
matrixMul( float* C, float* A, float* B, int wA, int wB)
{
// Block index
int bx = blockIdx.x;
int by = blockIdx.y;
// Thread index
int tx = threadIdx.x;
// Declaration of the shared memory array As used to
// store the sub-matrix of A
__shared__ float As1[WA];
__shared__ float Bs1[WA];
As1[tx]=A[bx*WA+tx];
Bs1[tx]=B[by*HB+tx];
// Synchronize to make sure the matrices are loaded
__syncthreads();
__shared__ float res1[WA];
__shared__ float res2;
__shared__ float res3;
res1[tx]=As1[tx]*Bs1[tx];
__syncthreads();
if (tx==0)
for (int i=0;i<WA/2;i++)
res2=res2+res1[i];
if (tx==1)
for (int i=WA/2;i<WA;i++)
res3=res3+res1[WA/2+i];
// Multiply the two matrices together;
// each thread computes one element
// of the block sub-matrix
// Synchronize to make sure that the preceding
// computation is done before loading two new
// sub-matrices of A and B in the next iteration
__syncthreads();
C[by*WC+bx]=res2+res3;
}
the building is working fine, but when I lunch it. I get:
Windows has triggered a breakpoint in matrixMul.exe.
This may be due to a corruption of the heap, and indicates a bug in matrixMul.exe or any of the DLLs it has loaded.
The output window may have more diagnostic information
Thanks a loot even for just a direction;