I have few questions for the experts:
Can I assume atomics on pinned memory to work consistently with CPU atomics on the same memory locations ? (Or rephrasing the question - can I use pinned memory to create lock-free data structures for interoperation with CPU)
Can I write a specific value to pinned memory and assume that this value may be visible to the CPU before the kernel finishes ? Do I need to issue any of the __threadfence calls ? I have a long kernel which reports stages of completion to cpu so that it can start doing some cpu-related work after each kernel stage completes. For this purpose I was planning to trigger status flags in pinned memory, which CPU can poll. I wonder if this approach can work that way since I need faster reaction time from cpu than waiting for the whole kernel to complete?