A custom device allocates memory using dma_alloc_coherent() and return it to userland using remap_pfn_range() on .mmap routine . User mmap() the device and pinned the memory using cudaHostRegister/cudaHostUnregiste. After closing file descriptor from custom device, dmesg report a lot of failures trying to release pages since his refcount value is negative (-1023).
I prepare a small repository to show the problem
It bassically consist on a device and a userland test that triggers the memory leak. Its really easy to build and try.
I think the negative value is closely related to GUP_PIN_COUNTING_BIAS (1024). After analyzing kernel by ftrace I found:
- cudaHostRegister() is not calling pin_user_pages()
- cudaHostUnregister() is calling os_unpin_user_pages → unpin_user_pages
I dont know why cudaHostRegister() is not actually pinning the pages. Also, why is cudaHostRegister calling unpin_user_pages on unpinned pages.
Stop the memory leak