MultiGPU Programming - Cuda Error 400

We implemented a process which contains 2 threads (besides the main thread). Each thread sets a different device.
Thread 1 sets device 0 and thread 2 sets device 1
The main thread allocates memory using cudaHostAlloc. This memory area is used by both threads.
Besides of this memory area each thread allocates all its required resources within the device scope
As part of its execution the second thread copies a part of this memory each time it wakes up using cudaMemcpyAsync using a stream (created on device 1)
An event (also created on device 1) is connected to that stream.
Every time (but not always on the same cycle) the cudaEventSynchronize, which waits on the event returns error 400

error 400 is invalid resource handle. It generally means that you are attempting to use a created resource (device memory allocation, event, stream) on a device (or context, in driver API) that it was not created on/for.

one possibility is that you may be using a stream on a device that it was not created on.

There is no possible way to be definitive based on a text description, without a short complete example that demonstrates the issue.