Is it possible to use a cuda stream and event created on the host in the device?
Something like,
__global__ void kernel(..., cudaEvent_t _event,
cudaStream_t _stream) {
// code
cudaEventRecord(event, stream);
// code
}
int main() {
cudaStream_t stream {};
cudaEvent_t event {};
CUDA_CHECK(cudaStreamCreate(&stream));
CUDA_CHECK(cudaEventCreate(&event));
kernel<<<x, y>>>(..., stream, event);
}
Is this “legal”, as in, is this supposed to work?
I tried it out, and it doesn’t work.
It seems like in-kernel streams and events only work on those created on device itself, i.e; in-device event must use event and stream objects created on the device only. Is this correct? I might’ve missed this in the documentation, please do point me to it, for reference.
Thank you