__threadfence() across kernels? Fermi


The manual says that __threadfence() waits until memory accesses made by the thread are visible to all the threads in the device. Does it guarantee this kind of coherency between threads in different kernels?


It sounds like you’re trying to pass information between two concurrent kernels at runtime. You really, really shouldn’t do that.

Actually, i wonder if I could have the host send notifications to a running kernel. A ‘notifier’ kernel could deliver the notification to the running kernel if this info could be passed between kernels in the device. Is there another way to pass notifications to a running kernel?


zero-copy memory so you read host-memory in your kernel. If you mark it volatile it should read the value over the PCI-E bus every time you read it.

this is another one of those things you really shouldn’t depend on working…

Thanks tmurray

From a parallel programming perspective I agree wholeheartedly, but technically this is documented to work as far as I know (not that i rely on it ;)).

Nope, it’s actually not. Zero-copy to a running kernel is a bad idea because it is really tricky to do right and you may encounter chipset bugs. I I don’t think we’ve ever claimed that you can communicate in that way.