Problem with CUDA streams only part of data are being processes


I’m trying to write a program using streams. I know using cudaMallocHost not only can allocate host memory, but also can page-lock it. What I’m concerning about is that is there a upper limit of the page-locked memory part in the memory? Say if the data set I want to traverse is very large like 500MB or so, is there any problem if I cudaMallocHost(500MB)? And how can I make sure it’s already been page-locked?

Thanks very much.