In my application, i have a producer and a consumer thread. Producer h2d copies the data and outputs to a device buffer. Passes the buffer address and the stream to wait for, to consumer.
Consumer makes its own copy (d2d enqued to the received stream) from this buffer then processes it.
The problem is, when I launch producer and consumer, i get cudaErrorInvalidKernel. However when I let producer run at least once without consumer launched, then launch the consumer everything is fine. I also checked the 1st and 2cnd data/metadata passed from producer to consumer to see if there are any difference in the first execution but everything was the same.
Consumer and Producer are built using seperable compilation and are two shared libs(.so).