Memory usage will drop after I use trtexec

A very strange thing, when I start my app which load a serveral models, it will use about 3G RAM. and the total free RAM was about 1G。After I run trtexec --loadEngine=XXX at the same time when the app is running, the app will use only 1.3G RAM. the memory usage will drop. Why this could be happen?


The memory is used for both workspaces and loading the essential libraries.

When you deserialize an engine file, TensorRT doesn’t need to load the model conversion library (ex. onnxparser).
And this will save memory usage.


maybe I mislead you, the phenomenon is when I start my app(which load a serveral models(already serialize to engine file) alone, it will use about 3G RAM after start up, when my app is running, I run trtexec --loadEngine=XXX to test some model in other shell, my app will only use 1.3G RAM(without restart it, it’s memory usage dropped directly). I wonder if there are some cache in cuda or tensorrt when start up, and can be free when the RAM is not sufficent for other app to load one more model

Any advice to investgate the strange problem?


How do you measure the memory usage of your app?
Do you cat the data at /proc/self/stat?

More, could you test the similar with our default TensorRT example?
It will be good to know if this issue is app-dependent.


Thanks for your reply, I have tested it with only trtexec, when start with one trtexec process(very long runs to keep it running), it cost 1.8G RAM, and then start with another two trtexec, the first trtexec process will cost 1.4G(without restart). I was measure RAM with “top”. According to jtop, the CPU RAM dropped


It’s more recommended to monitor the process memory usage in a similar way as below.
It can give you a more accurate memory usage for each process.

void process_mem_usage(double & vm_usage, double & resident_set)
   std::ifstream stat("/proc/self/stat", std::ios::in);

   // dummy vars for leading entries in stat that we don't care about
   std::string pid, comm, state, ppid, pgrp, session, tty_nr;
   std::string tpgid, flags, minflt, cminflt, majflt, cmajflt;
   std::string utime, stime, cutime, cstime, priority, nice;
   std::string O, itrealvalue, starttime;

   // the two fields we want
   unsigned long vsize;
   long rss;

   stat >> pid >> comm >> state >> ppid >> pgrp >> session >> tty_nr
       >> tpgid >> flags >> minflt >> cminflt >> majflt >> cmajflt
       >> utime >> stime >> cutime >> cstime >> priority >> nice
       >> O >> itrealvalue >> starttime >> vsize >> rss;

   vm_usage = vsize;
   resident_set = rss * sysconf(_SC_PAGE_SIZE);

void print_mem_usage(const char * description)
    double vm_usage, resident_set;
    process_mem_usage(vm_usage, resident_set);
    std::cout << "Memory usage " << description << ": " << vm_usage / (1024*1024) << " MiB virtual, "
        << resident_set / (1024*1024) << " MiB resident" << std::endl;


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.