cudaHostRegister breaks data

Hi everyone,

I have a strange problem with cudaHostRegister(). What I do is to malloc some memory with new and later make it pinned memory with cudaHostRegister. When I then copy the data into the GPU using cudaMemcpyAsync the data is corrupted. There are often (not always) blocks of random data in it. Sometimes it is data that was written some time back into my host memory before using cudaHostRegister().
When I use cudaHostMalloc instead of new/cudaHostRegister, I do not observe this problem.
This is how a allocate the memory:
compex* data = (complex*)new char;
And then mark it as pinned memory:
cudaHostRegister(data, sizeof(complex)*sampleCount, 0);
I do correct error checking, bot there are none.
I read somewhere that you cannot use new/malloc to allocate memory that will be used with cudaHostRegister, as it has to be page-aligned. The posts were very old (like CUDA 4.0) and I could not find anything about this in the documentation, so I guess this is an old problem and is no longer required. I also tried using valloc instead of new, but still the same problem.
When I use new and do not apply cudaHostRegister, everything works fine, but cudaMemcpyAsync () is not async any more, which I need.
I also cannot use cudaHostMalloc as I will also be using memory that has been malloced by some library (Armadillo).
What could cause this problem and how can I fix it?