Writing 32 bit value from the GPU to a memory location, not allocated by cuda API

Hello,

We need to be able to get a GPU program to write a 32 bit value to a memory location that the host program will provide. The address into which we need to write is an external memory located on a PCI device and mapped into system memory via one of the device’s BARs.
We need a driver API function that can be associated with a cuda stream and enables us to do this.

Can you please advise?

Thank you.