Screen flicker then program behaves badly

I am currently running cuda 2.0b2 on at gtx280, and I am encountering some very strange behavior. I’m calling cuda programs from a vc++ program that allocates the memory once and then passes pointers. It works fine in debug, but produces the wrong output if I use MT Dll instead of MT Dll debug libraries. Second when running with the debug libraries it works fine for a few iterations (the code is called over and over again), then the screen flickers and it stops working correctly.

In fact afterwords if I run the code again it runs slower, and does the same thing and if I run any other program which uses the graphics card it preforms very poorly.

I think I’m making some sort of horrible memory error. All the memory is allocated in the main program with cudaMalloc and then filled using the cudaMemcpy command. Then the LDOS code is called. It calls integrated, then summation which uses the reduced code given in the SDK.

Any help about what I am doing wrong would be greatly appreciated.

global void integrated(float * en, float gamma1, float gamma2,float D00, float B, float gaps,
float * gapm, float * gapm2, float * ekm, float * distance, float * sum) {
int i=blockIdx.x * blockDim.x + threadIdx.x;
float gapma, gamma=gamma1+gamma2en[i];
if(fabs(gapm[i]gaps) <= D00) {
gapma=gapm[i]gapsB+gapm2[i]gaps(1.0-B);
} else {
gapma=gaps
gapm[i];
}
float selfei=-gamma-gapma
gapmagamma/((en[i]+ekm[i])(en[i]+ekm[i])+gammagamma);
float selfer=gapma
gapma*(en[i]+ekm[i])/((en[i]+ekm[i])(en[i]+ekm[i])+gammagamma);
sum[i]=selfei/((en[i]-ekm[i]-selfer)(en[i]-ekm[i]-selfer)+(selfeiselfei))*(distance[i]);

}

host void summation(float sum, float * reducedout, int xsize, int ysize, int nosts, float * sum2, float * poop, float tempadd) {
int blocks, threads, blocks2, threads2;
getNumBlocksAndThreads(6, xsize
ysize, xsize
0.5, ysize0.5, blocks, threads);
getNumBlocksAndThreads(6, blocks, xsize
0.5, xsize0.5, blocks2, threads2);
dim3 dimBlock(threads, 1, 1);
dim3 dimGrid(blocks, 1, 1);
for(int i=0;i<nosts;i++) {
cudaMemset((void **) &tempadd,0,2.0
xsizesizeof(float));
reduce(ysize
xsize, threads, blocks, sum, tempadd, xsize, ysize, i);
reduce(blocks,threads2,blocks2,tempadd,poop,0,0,0);
cudaMemcpy(sum2,poop,blocks2*sizeof(float),cudaMemcpyDeviceToHost);
reducedout[i]=sum2[0];
}
}

host void ldos(float * en, float gamma1, float gamma2, float D00, float B, float gaps,
float * gapm, float * gapm2, float * ekm, float * distance, int xsize,
int ysize, int nosts, float * output, float * sum, float * outputh,
float *sum2, float * tempadd, float * poop) {

int N, gridsize; 			   
N=xsize*ysize*nosts;
dim3 dimBlock(256);
gridsize=N/256;
dim3 dimGrid(gridsize);
cudaMemset((void **) &sum,0,N*sizeof(float));
integrated<<<dimGrid,dimBlock>>>(en, gamma1, gamma2, D00, B, gaps, gapm, gapm2, ekm, distance, sum);
summation(sum, output, xsize, ysize, nosts, sum2, poop, tempadd);

}

sounds like the kernel is accessing memory it’s not supposed to access. you should make sure that all kernel threads access proper memory locations or return. For example, if your input buffer provides 3 inputs and you always call 128 kernels, give the number of inputs in the input buffer as a kernel parameter and “if (threadIdx.x > maxInput-1) return;”

Thanks, I gave that a shot, and no luck it still randomly crashes after a preset time. Is there any utility/way to look at memory utilization on the card while code is running?

I striped down the code to bare minimum’s and ran it 10000 times without error, so I’m wondering if there is some problem with the way the pointers to the device memory are being passed? Is there a specific way this has to be done inside non-cuda c code?

Hmmm If I add cudaThreadSynchronize() after calling the integrated routine it runs fine for a while then returns a unspecified launch failure. ANyone have any ideas?

Well it looks like the new drivers fixed this problem.

Hello, I’m using the last CUDA driver with Quadro FX1700 under XP. If some memory allocating or copying operation is wrong, the next starts of the fixed application are extremely slow until computer restart. Is there some way to reset video card without restarting everything? Thanks.

I never figured out a way to reset the card. In fact when I had the problems other programs which used the card would behave poorly till I reset. Fortunately for me the latest drivers fixed the problems. Sorry. You could try to grad the errors, using CUDA_SAFE_CALL and error=cudaThreadSynchronize(); that might help.

I don’t get this under Vista (Win2k8). If the app misbehaves the driver restarts itself automatically. If I do this too many times, however, the video does start to get corrupted. But never performance degradation.

Also, you might try something like putting the computer to sleep, standby, or hibernate to try to reset the driver.