CudaMallocHost memory access from device

CUDA Toolkit Version: 8.0.6.1
IDE: Visual Studio 2015 Community

Tutorials I looked at:
http://cedric-augonnet.com/accessing-pinned-host-memory-directly-from-the-device/

Manual Entry:
http://horacio9573.no-ip.org/cuda/group__CUDART__MEMORY_g9f93d9600f4504e0d637ceb43c91ebad.html

So, enough of the context.

I tried to get smarter from the documentation. It says:
‘Allocates size bytes of host memory that is page-locked and accessible to the device.’
So I thought possibly naively, that I can use it just as a normal Variable, that I can share.

I wrote my own programs, but everytime I tried to access the variable directly in the kernel function, the kernel function failed.

So my question is:

What do I have to do, to access the pinned memory from device? Is it even possible as shown in above examples?

From my observations, it seems as if CudaMallocHost ist just a replacement for malloc, to speed up CudaMemCpy from/to device/host.

TL;DR

How to use CudaMallocHost properly as a result array?

BR,
Sebastian.

yes it is possible.
there are cuda sample codes, plus probably dozens of already written examples on the web, that demonstrate this.

since those examples apparently are not working for you, I fail to see how yet another example will help.

Perhaps you should write a very simple program that does what you describe - I could write it in probably less than 20 lines of code. If it works, start there. If it does not work, and you post the complete example here, someone will probably be able to help you.