Emulation mode in Matlab stalling when using a block dimension of 16x16

Hi

I have some simple code which basically is just adding numbers. I run this code with the following:

dim3 dimBlock( 16, 8 );
dim3 dimGrid( 16, 16 );

This works fine, and returns the result in less than 1 second. If I increase the block dimension to 16x16, my Matlab simply waits with no CPU usage or memory usage, and I have to kill the thread through windows. Any reason why this would happen?

/Cheers

Henrik Andresen

I have found some new info as well.

This stall happens whenever I start a process with more than 155 threads. Changing the number of blocks also don’t do anything. The moment I execute a thread with 156 threads, my entire process stalls with no CPU usage whatsoever.

I compile using VC++ express 2008, Matlab 2010a and the latest CUDA. I link to the cudartemu.lib as well. I run on a windows 7 32-bit edition.

Would anybody have an idea of what is happening?

Is this the correct part of the forum, or should I post in another place.

Hmm… forums crashed on me. So just a double post