matrix multiplication using global memory

hi everyone~!!!

i’m studying cuda now…

attach code…

it’s matrix multiplication in nvidia programming guide…

i think it’s correct code.but matrix size larger…for example 4096 X 4096…or more larger…

it didn’t work with error message

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

cudaSafeCall() Runtime API error in file <c:/Visual Studio 2005/Projects/matrixMul/matrixMul _global.cu>

, line 118 : the launch timed out and was terminated.

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

using 4096 X 4096 matrix size… it’s enough global memory size…

my experimental environment: window7, cuda 2.3, GTX260…

could you tell me what is the matter???

thanks everyone~!!!
matrixMul.zip (300 KB)

There is a time limit of 2-5 seconds on GPUs with an attached display. That error is telling you that your kernel is taking too long to finish and is being killed by the display driver (which is why it only happens at large matrix sizes). Either make you code faster. limit yourself to smaller problems, or use a dedicated compute GPU to get around the problem.

I’ve never understood this restriction. I have written kernels that take 40 minutes to finish, which is terrible I know, but I’m just saying. In fact, I have never had the timeout error occur. Does this depend on which GPU I am running?

It is pretty pragmatic. Would you prefer your desktop to completely freeze for the duration of a kernel/rendering application (possibly forever if the app/kernel crash), or would you prefer that the display manager to proactively ensure that the hardware it is managing continues to work?

If you have no display manager on the card, there is no timer. I is apparently possible to disable it on some flavours of WDDM windows via register hackery, In linux, if you don’t extent the display manager onto the card, the limit doesn’t apply. I have dedicated compute cards which can run one kernel for minutes/hours without problems. It is only on cards with an active display that the timer is a problem/feature

Haha, I didn’t mean for it to sound like I thought the restriction was stupid. I just merely meant I didn’t know why I kept hearing about it but it never seemed to apply to me. I’ve never taken measures to disable it, so I would have never known that so-called restriction was even there.

so…how can i solve the error?? :)

so…how can i solve the error?? :)

so…how can i solve the error?? :)