Diagnosing error messages cudaError_enum

MSVC 2005 gives this error if I try to start more than 256 threads per block. I looked in the docs and read that 1.1 devices can handle up to 512 threads per block. Is there a way to find out what is going on behind the scenes? I am not using pointers and is very conservative with memory.

I get this with cutilCheckMsg:

What resources are limited?

Double check your grid and block parameters for launching the kernel?