[Jetson] Why cudaMalloc can require memory more than "free" type has

Cuda version is 11.4; Orin AGX 64G
When I got free command print below:

              total        used        free      shared  buff/cache   available
Mem:       64300584    39123416     2623172     1890612    22553996    22145096
Swap:      32150256    11040460    21109796

Now if I cudaMalloc 5GB memory will it success?
Actually I tested it with code below

#include <stdio.h>
#include <unistd.h>
#include <cuda_runtime.h>

#define SIZE (5 * 1024 * 1024 * 1024ULL) // 5GB

int main() {
    cudaError_t err;
    void *d_memory;
    while(1) {
    err = cudaMalloc(&d_memory, SIZE);
    if (err != cudaSuccess) {
        fprintf(stderr, "Failed to allocate device memory - %s\n", cudaGetErrorString(err));
        return 1;

        err = cudaMemset(d_memory, 0, SIZE);
        if (err != cudaSuccess) {
            fprintf(stderr, "Failed to set device memory - %s\n", cudaGetErrorString(err));
        err = cudaMemset(d_memory, 10, SIZE);
        if (err != cudaSuccess) {
            fprintf(stderr, "Failed to set device memory - %s\n", cudaGetErrorString(err));


    return 0;

And it succeed.
My question is why cudaMalloc can require 5GB, but “free” only has ~2GB? Or cudaMalloc will occupy “buff/cache”'s memory like cpu process does?
Thanks for your reply!

I can’t answer about what cudaMalloc does or requires, but buffer and cache memory is essentially free memory on any Linux system. It just happens that some operations have improved performance when buffering or caching, but those processes do not require that memory. If anything else requests memory, then buffer and/or cache is treated as free. As an example, if you were to do this on a freshly booted system which has not needed to access the disk, you would see buffer/cache go up:

cd /
sudo find . -type f -print0 | xargs -0 cat > /dev/null

Then, if memory is limited, and you start using applications requiring it, then you would see cache/buffer going down. The filesystem knows that if a file has been read once that it is more efficient to read it again from RAM so long as it has not been written. Or to write modifications to RAM and then lazy write to actual files. If there is free RAM the act of reading file content tends to put a copy in RAM; something requiring RAM releases that buffer/cache, and nothing is lost…if a file was edited, then it is flushed to disk before releasing RAM, but the unwritten files would be released first from buffer/cache.

An interesting GUI app is xosview, which you can use the mouse to enlarge, and it will show bar chart details regarding cache and buffer.

Thanks linuxdev!
But my question is: does cudaMalloc will do the same things as malloc?

I can’t answer the specifics of cudaMalloc internals (perhaps @dusty_nv can?). However, requests for memory are part of the kernel’s memory management. I don’t think cudaMalloc bypasses that mechanism.

Any driver or user space program goes through resource management (see “man -a ulimit”). Keep in mind that your question isn’t really about cudaMalloc: You’re asking about buffer and cache being given up when something else requires more memory. It doesn’t matter that it is cudaMalloc requesting that memory and applying pressure to resources. So describing cudaMalloc is just coincidental to the question of buffer and cache release.


cudaMalloc can only use available physical memory.
A possible reason is that the request forces the system to release some reserve memory.


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.