Is there a way to find the memory visit error in GPU?

I found some strange error in my program.
I have two GTX295 cards with 4 GPUs and run 4 host threads on the 4 GPUs. The 4 threads run individually and don’t write to the same host memory, just read from the same mapped page-locked memory.
If I run the 4 threads in the same time, the results will be wrong. If running them one after one, the results are right.

Because I have more than one thread , i can’t use cuda-gdb. And I also failed to run it in the -deviceemu mode due to fail to allocate page-locked memory.
I guess that there may be some memory visit error in the global memory.
Are there some methods to find them??

I believe pinned memory is only valid within that GPU context.

Try unpinned memory

Thank you.

I had found where is the error. I found some data read-only were changed in the threads.

Do you mean that the pinned memory can not be used in -deviceemu mode?