The enqueue() function takes a cudaEvent_t as an input, which informs the caller when it is ok to refill the inputs again.
Is there some sort of signal that informs the caller when it is ok to call enqueue() again? Does the caller need to wait until the previous call to enqueue is complete? Or can enqueue() be called simultaneously from two different host threads with two different sets of input and output memory? Does it matter whether the calls to enqueue() use the same or different streams? Do the separate threads need to wait for the cudaEvent_t signalling that the inputs may be refilled again?