Concurrent Copy and Execution, and Page-Locked Memory Mapping

Hi,

I have a program that uses page-locked memory by calling cudaMallocHost, and cudaMemcpyAsync. When I am running this program on 9600GT, its compiling without any errors while during execution I am getting Segmentation fault.

1-Is this because my Graphics card does not support paged locked memory mapping?
But when I run deviceQuery, it shows that my graphics card supports Concurrent Copy and Execution. Now I am puzzled if my graphics card is not supporting page-locked memory mapping (and thus cudaMemcpyAsync would not work), how can my card support concurrent Copy and Execution?

2-Is there any other way to for concurrent Copy and Execution apart from using page-locked memory mapping and cudaMemcpyAsync?

Thanks in advance