Memory allocation : strange behavior


Yesterday I posted a topic about a memory allocation problem.
The problem is the following : I do 4 cudaMalloc. The sum of the memory allocated is function of 3 variables N, M, and D. What I do is to use my function with different values of M (actually from 256 to 3072). N=38400 and D=96.

Before, when I used M=2560, the total memory allocated was 410MB, and among the 4 malloc, one tryied to allocate 394MB. In this case, this malloc failed in spite of that the free memory is 750MB.

Now, I use the value returned by cudaMalloc to manage errors during memory allocation. Moreover, I do the biggest memory allocation in the beginning. With these small modifications, the memory allocations seem to work well.

I don’t understand why memory allocation failed.

Do you think that the order of cudaMalloc can change anything?
Do anybody have this kind of problem?


Having free memory and Having continously free memory are two different things. There should be a better interface than “cuGetMemInfo” or whatever… to find out whats the maximum free-chunk size available. Probably there is one… I dont know.

ANd, certain things could depend on the memory allocator’s algorithm itself. All depends on the NVIDIA driver’s memalloc strategy.

Hum… I think you are right. 400MB is more than half memory size. It seems to be logic that it is hard to have this amount of free linear memory after 2 cudaMalloc (respectively of 15MB and 1MB).
Thanls for your helpfull answer!!!

Thanks for your words. But I still think, if you have 700MB free – I would expect 400MB of it being linear… Hmm… The answer really depends on the memory-allocator’s algorithm.

Actually, I do this

   // Allocation CUDA memory

    cudaMallocPitch( (void **) &ref_dev    , &ref_pitch    , ref_width*sof   , ref_height   );

    cudaMallocPitch( (void **) &query_dev  , &query_pitch  , query_width*sof , query_height );

    cudaMallocPitch( (void **) &dist_dev   , &dist_pitch   , query_width*sof , ref_width    );

    cudaMallocPitch( (void **) &output_dev , &output_pitch , query_width*sof , 1            );

These 4 malloc are the first things done after the variable initialisation.

As you can see, it’s very simple… The cudaMalloc which cause the error is the 3rd one (using dist_dev).