Hi all,
i’ve been developing an NMF implementation with CUDA on a 280gtx. Everything was all right, but the card is now broken, and now im running the code on a 8800gtx. I know that atomic operations doesn’t work on G80 cards, but I have another problem. I upload 2 matrices to the card memory, and I operate them (I split them into 4 pieces, multiply them with cublas) and when it reaches a kernel that makes a point by point mult or div, it freezes.
Here is the code of the kernel:
#include <stdio.h>
#include <stdlib.h>
#define BLOCK_SIZE 16
__global__ void DivPunto(float* A, float* B, int alto, int ancho, float* H)
{
int bx = blockIdx.x;
int by = blockIdx.y;
int tx = threadIdx.x;
int ty = threadIdx.y;
int posicion;
__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];
posicion = (by*BLOCK_SIZE + ty)*ancho + bx*BLOCK_SIZE + tx;
if((bx*BLOCK_SIZE + tx)<ancho && (by*BLOCK_SIZE + ty)<alto)
{
As[ty][tx] = A[posicion];
Bs[ty][tx] = B[posicion];
__syncthreads();
if(Bs[ty][tx] == 0)
Bs[ty][tx] = 0.00001;
H[posicion] = As[ty][tx]/Bs[ty][tx];
}
}
I have bidimensional blocks of 16x16, and the info of used registers and shared memory is this:
ptxas info : Used 6 registers, 2096+1072 bytes smem, 20 bytes cmem[1]
This code was running flawlessly on the 280gtx card… and I don’t know what happens now.
Thanks in advance
(and sorry for my english)