I have the following matrix multiplication code, implemented using CUDA 3.2 and VS 2008. I am running on Windows server 2008 r2 enterprise. I am running a Nvidia GTX 480. The following code works fine with values of “Width” (Matrix width) up to about 2500 or so (in under a second).
[codebox]int size = WidthWidthsizeof(float);
float* Md, *Nd, *Pd;
cudaError_t err = cudaSuccess;
//Allocate Device Memory for M, N and P
err = cudaMalloc((void**)&Md, size);
err = cudaMalloc((void**)&Nd, size);
err = cudaMalloc((void**)&Pd, size);
//Copy Matrix from Host Memory to Device Memory
err = cudaMemcpy(Md, M, size, cudaMemcpyHostToDevice);
err = cudaMemcpy(Nd, N, size, cudaMemcpyHostToDevice);
MatrixMultiplicationMultiBlock_Kernel<<<dimGrid, dimBlock>>>(Md, Nd, Pd, Width);
err = cudaMemcpy(P, Pd, size, cudaMemcpyDeviceToHost);
//Free Device Memory
When I set the “Width” to 3000 or greater, I get the following error after a black screen:
I looked online and I saw that some people has this issue because the watchdog was killing the kernel after it hangs for more than the time specified in “TdrDelaty” seconds. I added TdrDelay as a REG_DWORD to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Contol\GraphicsDrivers and set a time of like 30 seconds. After 30 seconds, I get the same error. When I set TdrLevel to 0, it just freezes… I get no error, but I get no response from my machine. Am I exceeding memory capacity somewhere? Any help would be greatly appreciated!!!