Limits on Matrix matrix multiplication

I was a little curious about if the limits in the matrix size running matrix-matrix multiplication could be due to memory. But, I found that sizes greater than 4800x4800 for matrices A and B ( where C=A * B ) can crash my system. I received a message from Windows telling me that the NVIDA driver was not working well and that the system must be rebooted. Why does it happen? What could be the possible solutions to process matrices with bigger sizes? At least to be close to the memory limits.



Which card are you using and how much memory do you have in your system?

I can run a 7680x7680 on a Quadro FX5600.

How many cards do you have in the system? If you have only one card, the Windows watchdog will not be happy if you are using CUDA for more than 5 seconds.

You can always split a big matrix multiply in multiple smaller ones.


This is the 5 second bug described in the release notes.

Basically, if your computation takes longer than 5 seconds on a GPU connected to a display, there is a system freeze.

To avoid it disconnect the CUDA GPU from the display and get yourself a cheap display card (or another 8800 :^))

On an unconnected card you can go to the size limits.

Thanks a lot, I will connect a second card tomorrow.


I have a question, does the same problem exists in Linux? I’m running Fedora 6, and i’ll be using the 8800GTX. Thanks a lot!

Yes it does exist in Linux also if you run a X11 server on the card.

Same as in Windows, if there is no display attached to the card, there is no limit.

So in Linux you can boot to runlevel 3 (text console) and load the nvidia driver manually or in boot.local. You need the driver for communicating with the card. Then you can run without limit and this works nicely for remote server machines.

Or you use a second card. Hint: Many chipsets have a built-in graphics chip that you can run the X11 server on. So the 8800 is free for CUDA.


I confirme this on a linux Red Hat 4.2, even if the limits can be larger ( ~ 5600 ).

Peter, are you sure this is correct? There are several postings in the CUDA on Windows and Linux sections that demonstrate problems running kernels longer than 5 or 6 seconds – even when the GPU is not a display.

I can’t speak to Linux, but under Windows XP, this occurs even if the desktop is not extended to the GPU. Although it doesn’t crash the system, the kernel runs for its full length of time, then returns an error of the form: “Cuda error: Kernel execution failed in file ‘’ in line xxx: unspecified launch failure.”

If you’ve been able to achieve reliable kernel runtimes > 6 seconds under Windows, please tell us how you did it!

  • Under Windows XP, I have an FX 5200 as the primary display, and 8800 GTS as the GPU.

  • The GPU appears as Monitor 3 under Display Properties->Settings.

  • “Use this device as primary monitor” and “Extend my Windows desktop onto this monitor” are both UNCHECKED.

  • I’m using version 97.73 driver.

Maybe I do not understand the question, but we have run calculations of several 100 seconds without doing anything special on our 2nd 8800 GTX without any problems. I am not next to the machine now to give you the settings.

Here’s a reference thread – there are others under the Linux forum:

If you’re able to run 100+ second kernels under Windows XP, could you post your settings? What’re your primary and secondary cards?