3D Memory Allocation Any connection to Display Driver Stopped Responding?

Hello all,

I’m struggling with a little piece of code here. I’ve been using CUDA with Matlab, running Vista64 Ultimate and NVIDIA’s Tesla D870

I’m first allocating memory for a 3D matrix on the device memory - not before making sure that the device I’m working on is the Tesla (device 0) and not the internal 8600GT (device 1):

cudaSetDevice(0);

float *WBSresult;

errCuda = (cudaMalloc ((void **) &WBSresult,sizeof(float)*N*N*numberOfWindows));

I then need to allocate memory for the matlab variable on the host memory

const mwSize dims[]={N,N,numberOfWindows};

plhs[0]=mxCreateNumericArray(3,dims,mxSINGLE_CLASS,mxREAL);

float *ar4;

ar4 = (float *) mxGetPr(plhs[0]);

I then run the code on the device

ComputeWBS<<<32, 256>>>(WBSresult);

I should note that ComputeWBS is a very intense function and for large data files includes hundreds of thousands of iterations that should take a few minutes to run.

And I then copy the data from device to host:

errCuda = (cudaMemcpy( ar4, WBSresult, sizeof(float)*N*N*numberOfWindows, cudaMemcpyDeviceToHost));

The problem: For small matrices this works fine. For large ones I get the infamous “Display Driver Stopped Responding” message.

First, I wonder why do I even get this message when I’m not running my code on the card that’s connected to the display.

In this post and in others that I found on this forum the general recommendation is to use a GPU that’s not the primary one that’s connected to the display. That’s exactly what I did and I’m getting this error message.

Second, I wonder if this has something to do with the fact that I’m using cudaMalloc and CudaMemcpy for 3D matrices instead of cudaMalloc3D and cudaMemcpy3D. The documentation says:

I tried converting my code to cudaMalloc3D and cudaMemcpy3D and got lost somewhere on the way. I’m not certain if this has anything to do with my problem and I don’t know if it’s worth the hassle to convert the code if it has nothing to do with it. The code does work when there are fewer iterations or smaller matrix dimensions, so can it really be related to 3D allocation?

Any help rendered will be highly appreciated…

Thank you.

Y.

UPDATE:

I’m not sure why this happens, but every time I post a question on these forums I end up finding the solution all by myself 5 minutes after posting the question… :-).

I found the solution to the error message here:
http://www.microsoft.com/whdc/device/displ…dm_timeout.mspx
I ended up adding “TdrLevel” to HKLM\System\CurrentControlSet\Control\GraphicsDrivers and it worked. No more error message.

The code now works and works properly.

My new question now: If everything works fine with CudaMalloc and CudaMemCpy, why do I really need CudaMalloc3D and CudaMemCpy3D? What’s the point? Will it make it faster or is this just a matter of convenience?

The question that I had in my last post still remains, is it worth it converting the code to 3D allocations now?

Thank you!

Y.