above 512 threads unchecked ?

I just started using CUDA on my Mac. I run this simple helloworld kernel with disabled printf() and on purpose requested 40,000 threads to run on a grid with just one block.
This can’t run due to 512 max # of threads per block, but yet CUT_CHECK_ERROR() have not detected any problem.
Is there sth obvious I’m missing?

main(int argc, char** argv)
CUT_DEVICE_INIT(argc, argv);

// setup execution parameters
dim3 thrdInBlk(200,200); //
dim3 blkInGrid( 1);

// execute the kernel
helloWorld<<< blkInGrid, thrdInBlk >>>( );

// check if kernel execution generated and error
CUT_CHECK_ERROR(“Kernel execution failed”);


global void
helloWorld( )
// Synchronize to make sure data is loaded


CUT_CHECK_ERROR doesn’t actually do anything if you’re not running a debug build. You should not rely on cutil.h to do your error checking.