Does cuda4.2 support concurrent kernels?

I tried to run the concurrent kernels example in cuda4.2 sdk on M2090. It seems the kernels doesn’t run at the same time. Does anyone have similar situation?