I’ve encountered several problems with pinned memory in Win7.
If I allocate a lot of pinned memory, 950 Mb for example, then CUDA contexts can not be created (cuCtxCreate returns “out of memory”). My application becomes unstable after that.
If I allocate more than 470Mb of pinned memory, then the computations becomes slightly slower. If the amount of used pinned memory is more than 512 Mb, slowdown is up to two times. This only happens if I use two CUDA contexts in two CPU threads simultaneously. If I only use one context there is no slowdown. There are no CPU or GPU memory reallocations during computation, so context switching process must be taking very long.
cuMemHostAlloc (even 512kb) sometimes takes 100-200 milliseconds after I have already allocated more than 512 Mb of pinned memory. If I do 20-30 allocations then the application just hangs for several seconds.
All these problems lower practical limit for using pinned memory to 400 Mb, which is very small amount for video-processing applications that I develop. For instance, when processing HD video every frame takes 24 megabytes, I need to keep 10 or more of them in pinned memory simultaneously.
Is there any workaround for these? In some configurations using pinned memory brings significant performance improvement, but given the instability it may introduce, I will have to disable its usage unless the described problems are solved.