cudaMemGetInfo returns wrong amount free memory

kmccall · December 10, 2012, 5:06pm

I might be doing something wrong, but it appears that cudaMemGetInfo() doesn’t always return the correct amount of free memory. In the code below, I call cudaMemGetInfo() before and after cudaMalloc() and report the expected and actual memory allocated. I’m not surprised if a the amount allocated is a little bigger that what was requested, due to alignment restrictions, but here it often reports that 100 times the amount requested is allocated, or sometimes 0 bytes.

int main(void)
{
    // the number of cudaMalloc() calls to make
    const int n_alloc = 10;

    cudaError_t err;
    size_t free_bytes, total_bytes;
    long free_bytes_before, free_bytes_after;
    int *d_array[n_alloc];

    // arbitrary allocation sizes
    size_t bytes_to_alloc[n_alloc] = { 1689600, 112640, 600, 600, 600, 1800, 1800, 4, 4, 11232 };

    err = cudaSetDevice(0);
    for (int i = 0; i < n_alloc; i++)
    {
        // get the number of bytes available before allocation
        deviceSynchronize();
        free_bytes = getFreeBytes(2*i, &total_bytes);
        free_bytes_before = (long) free_bytes;
        if (!i) cout << "total bytes available = " << total_bytes << endl << endl;

        // make sure we are allocating an integral number of ints
        assert(!(bytes_to_alloc[i] % sizeof(int)));

        err = cudaMalloc(&d_array[i], bytes_to_alloc[i]);
        if (err != cudaSuccess)
        {
            cout << "cudaMalloc returned the error " << cudaGetErrorString(err) << endl;
            exit(1);
        }

        // get the number of bytes available after allocation
        deviceSynchronize();
        free_bytes = getFreeBytes(2*i+1, &total_bytes);
        free_bytes_after = (long) free_bytes;

        // expected bytes that were to be allocated, and the actual amount that was allocated
        long expected = (long) bytes_to_alloc[i];
        long actual = free_bytes_before - free_bytes_after;
        long diff = actual - expected;
        cout << "Expected bytes = " << expected << "	 actual bytes = " << actual
             << "	 difference = " << diff << endl;
    }
}

// -------------------------------------------------------------------------------------------
// getFreeBytes
// -------------------------------------------------------------------------------------------
size_t getFreeBytes(const int where, size_t *total_bytes)
{
    size_t free_bytes;

    cudaError_t err = cudaMemGetInfo(&free_bytes, total_bytes);
    if (err != cudaSuccess)
    {
        cout << "getFreeBytes: call index " << where
            << ": cudaMemGetInfo returned the error: " << cudaGetErrorString(err) << endl;
        exit(1);
    }
    return free_bytes;
}

// -------------------------------------------------------------------------------------------
// deviceSynchronize
// -------------------------------------------------------------------------------------------
void deviceSynchronize(void)
{
    cudaError_t err = cudaDeviceSynchronize();
    if (err != cudaSuccess)
    {
        cout << "cudaDeviceSynchronize returned the error " << cudaGetErrorString(err) << endl;
        exit(1);
    }
}

Here is the program output:

total bytes available = 3220373504

Expected bytes = 1689600 actual bytes = 1703936 difference = 14336
Expected bytes = 112640 actual bytes = 1048576 difference = 935936
Expected bytes = 600 actual bytes = 1048576 difference = 1047976
Expected bytes = 600 actual bytes = 0 difference = -600
Expected bytes = 600 actual bytes = 0 difference = -600
Expected bytes = 1800 actual bytes = 1048576 difference = 1046776
Expected bytes = 1800 actual bytes = 0 difference = -1800
Expected bytes = 4 actual bytes = 0 difference = -4
Expected bytes = 4 actual bytes = 0 difference = -4
Expected bytes = 11232 actual bytes = 1048576 difference = 1037344

I am using CUDA 5.0, driver 304.64 and a Tesla K10.G1.8GB card. This also happen in CUDA 4.2 with the 295.53 driver

eelsen · December 10, 2012, 6:22pm

I don’t think that it is reporting a “wrong” amount, just that what it is reporting is different from what you are expecting. Memory cannot be allocated in arbitrarily small chunks - a page is the minimum amount of memory than can be allocated (usually at least 4KB). Sometimes cudaMalloc can give you a pointer without actually allocating any new memory - it can use part of a page that has already been allocated. Judging from your experiment it looks like cudaMalloc chooses to allocate 1MB for very small allocations and then starts to fill that chunk in without actually making further allocations. I don’t know why it sometimes chooses to allocate another 1MB even though it clearly has room in a previous chunk.

kmccall · December 10, 2012, 7:18pm

Thanks a lot! Do you consider the fact that it “allocates another 1Mb even though it has room in a previous chunk” to be a bug? That really makes it difficult to predict how much memory I will need given my problem parameters and the amount of memory already in use. I dynamically size my device arrays based on such a prediction.

Maybe I could allocate a minimum of 4 Kb for each variable, even for the scalars and the small arrays. Perhaps the driver would allocate exactly that amount for each variable and not try to outsmart me.
Thoughts?

eelsen · December 11, 2012, 1:01am

I highly doubt it is a bug. It is trying to help you out, not get in your way. It looks like the algorithm might be deciding to allocate new chunks when the size of the requested allocation changes (but not for extremely small chunks like 4 bytes). You aren’t “losing” any memory because of this, it will eventually fill in all the unused spots.

I wouldn’t recommend trying to do what you are suggesting. If there are multiple threads/processes running then even after you call cudaMemGetInfo, something else may allocate memory before you finish allocating all of yours and your prediction could be way off. The only way to know for sure is making sure each allocation has succeeded.

One solution is to allocate as large a chunk of memory as you can (use a loop with ever decreasing size until it succeeds). Then you can manage the memory explicitly yourself knowing exactly how much you have available.

Topic		Replies	Views
Accurately determining available global memory on a CUDA device CUDA Programming and Performance	2	14392	April 11, 2011
cudaMallocManaged malloc memory not same as requested CUDA Programming and Performance	7	794	April 17, 2019
CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista CUDA Programming and Performance	6	12258	July 1, 2009
used memory on device how much memory does device allocate after calling cudamalloc? CUDA Programming and Performance	1	5383	September 8, 2009
cudaMemGetInfo() doesn't work correctly when explicitly setting heap size for device dynamic mem CUDA Programming and Performance	2	1830	August 17, 2011
Memory allocation : strange behavior CUDA Programming and Performance	4	2554	March 4, 2008
cudaMalloc fails after CUDA Programming and Performance	6	7690	June 18, 2012
memory allocation problem CUDA Programming and Performance	2	4806	September 8, 2009
cudaMallocManaged allocating more memory than requested CUDA Programming and Performance	7	3139	July 13, 2018
cudaMalloc failed with unknown error after only 491656bytes CUDA Programming and Performance	9	4372	July 2, 2009

cudaMemGetInfo returns wrong amount free memory

Related topics