I am going to guess you have an active display running on the GPU which is running out of RAM. There is a driver API function cuMemGetInfo() which you can use to check the amount of free memory available on each device. You might be surprised how much a “modern” display manager (like WDDM + Aero, or Aqua or X11 + Compiz) uses.
now i am sure that use openmp like that is not correct.the result is that the two threads apply for memory on the same device. but in sdk samples “cudaOpenmp” do like that. what should i do if i want to use two threads apply for memory on different device?