problem with cudaMallocHost && concurrent Kernels

I’m trying to run 2 kernels (the same) at the same time on GTX 460 (compute cap. 2.1) and CUDA 3.2. But for cudaMallocHost I’m getting the invalid argument error. I know that for Async kernel launch I need cudaMallocHost but still I get the error.

you have to allocate the host memory with cudaMallocHost instead of malloc, not replace the cudaMalloc call with cudaMallocHost

