Global Memory Usage in K40

I am working on compression algorithm on K40. I ran into some issues while figuring out the maximum file size that I can process at a time in global memory. While my code breaks when I use file sizes > 800 MB, the
cudaMemGetInfo(&mem_free, &mem_total) gives a value of mem_free that is greater than 2K bytes after I transfer my > 800 MB file to global memory.
Can some on shed light on this?

It is not clear to me what the exact context and memory sizes are here (consider posting a repro program and stating the exact actual sizes encountered), but I would suggest considering these possibilities:

(1) Memory allocations may be made with a certain granularity, e.g. in multiples of a page size
(2) Memory may be fragmented, causing the largest free block to be much smaller than the total amount of memory still available

another common mistake is using 32-bit variables (e.g. “int”) for mem_free, mem_total, rather than 64-bit (e.g. “size_t”), but I can’t make heads or tails of the claims or asks being presented here.

I use size_t and print using %u to get the following data.

Exact Numbers below:
Before Transfer,
Total GPU Mem. : 12079 MB
Total GPU Mem. Available : 11931 MB

Case 1,
File size : 800 MB
After Transfer,
Total GPU Mem. Available : 3842 MB
Status : Success

Case 2,
File size : 1000 MB
After Transfer,
Total GPU Mem. Available : 6116 MB
Status : Failure (An illegal memory access was encountered)

The fact that the available memory has increased in Case 2 inspite of transferring a bigger file makes it more fishy.
P.S. Total transfer size includes other info. along with the file.

Hope this info.is sufficient.
Appreciate your help.

%u is incorrect for size_t, it should be %lu