cudaThreadSynchronize() and Device-Mapped Host Memory in CUDA 2.2

I’m thinking about using the new feature to map host memory into the device space for an output buffer I read on the host at the end of my kernel. Is it sufficient to call cudaThreadSynchronize() after the kernel to ensure all writes by the device have been flushed back to the host memory? The documentation mentions streams and events, but cudaThreadSynchronize() would be more straightforward for my simple CUDA usage.

I believe the answer is yes–I’ll double check, but it should be fine.