Alloc error convolving a 4096x4096 w 7x7 kernel?

Just tried the given example and used 4096x4096 for image and 7x7 for kernel. Program failed with memory allocation error. Tried smalled image so it will be padded to 4096x4096, still failed with memory allocation error.

What is the maximum size I can do?
Why memory allocation error, I have a 768MB 8800GTX?