AffinityMask changed process reduced to single core


I mixed the cuda “multithreading.cpp” example from the sdk to handle two cuda devices for some test calculation. after that i added a second loop to check how fast the cpu is in comparison to the cuda devices and wondered why only one core is used even if i told them to generate eight threads. After some searching it seems that something of the cuda (2.1beta) reduces the processaffinitymask to 1! if i set it up to values >1 it seems that some things going wrong.

Are there any limiations i didn´t read about regarding multithreading? Or does anybody have similar behaviour and now which part is reponsible for reducing to one core?
(Vista 64 Bit, VS2008)

The potential affinity issue is being discussed in this thread.

ups, sorry, not found while searching.

problem is solved now, it was the CUDA_PROFILE=1 environment var which affects the affinity.