How to measure the gpu memory of each layer of neural network

hi, Is there a way to measure gpu memory at each layer of the neural network and see which layer consumes the most gpu memory?

Hi @xidiantuoersuo,

You may find this info in engine building logs.
Example:

[I] [TRT] [MemUsageSnapshot] Builder begin: CPU 1074 MiB, GPU 2227 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1074, GPU 2235 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1074, GPU 2251 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1074, GPU 2259 (MiB)
[I] [TRT] [MemUsageSnapshot] Builder end: CPU 1074 MiB, GPU 2227 MiB

and on a per layer basis:

[V] [TRT] Layer: (Unnamed Layer* 0) [Pooling] HostPersistent: 0 DevicePersistent: 0

Thank you.

HI @spolisetty ,
Thank you for your reply, but I use the following command

./trtexec --deploy=/ModelZoo/mobilenet/mobilenet_deploy.prototxt --output=prob --verbose=true --useCudaGraph=true --dumpProfile=true

in engine building logs, i can’t find MemUsageSnapshot or a per layer basis,.Whether there are any other options to choose?

&&&& RUNNING TensorRT.trtexec # ./trtexec --deploy=/ModelZoo/mobilenet/mobilenet_deploy.prototxt --output=prob --verbose=true
[06/01/2021-03:55:54] [I] === Model Options ===
[06/01/2021-03:55:54] [I] Format: Caffe
[06/01/2021-03:55:54] [I] Model: 
[06/01/2021-03:55:54] [I] Prototxt: /ModelZoo/mobilenet/mobilenet_deploy.prototxt
[06/01/2021-03:55:54] [I] Output: prob
[06/01/2021-03:55:54] [I] === Build Options ===
[06/01/2021-03:55:54] [I] Max batch: 1
[06/01/2021-03:55:54] [I] Workspace: 16 MB
[06/01/2021-03:55:54] [I] minTiming: 1
[06/01/2021-03:55:54] [I] avgTiming: 8
[06/01/2021-03:55:54] [I] Precision: FP32
[06/01/2021-03:55:54] [I] Calibration: 
[06/01/2021-03:55:54] [I] Safe mode: Disabled
[06/01/2021-03:55:54] [I] Save engine: 
[06/01/2021-03:55:54] [I] Load engine: 
[06/01/2021-03:55:54] [I] Inputs format: fp32:CHW
[06/01/2021-03:55:54] [I] Outputs format: fp32:CHW
[06/01/2021-03:55:54] [I] Input build shapes: model
[06/01/2021-03:55:54] [I] === System Options ===
[06/01/2021-03:55:54] [I] Device: 0
[06/01/2021-03:55:54] [I] DLACore: 
[06/01/2021-03:55:54] [I] Plugins:
[06/01/2021-03:55:54] [I] === Inference Options ===
[06/01/2021-03:55:54] [I] Batch: 1
[06/01/2021-03:55:54] [I] Iterations: 10
[06/01/2021-03:55:54] [I] Duration: 3s (+ 200ms warm up)
[06/01/2021-03:55:54] [I] Sleep time: 0ms
[06/01/2021-03:55:54] [I] Streams: 1
[06/01/2021-03:55:54] [I] ExposeDMA: Disabled
[06/01/2021-03:55:54] [I] Spin-wait: Disabled
[06/01/2021-03:55:54] [I] Multithreading: Disabled
[06/01/2021-03:55:54] [I] CUDA Graph: Disabled
[06/01/2021-03:55:54] [I] Skip inference: Disabled
[06/01/2021-03:55:54] [I] Input inference shapes: model
[06/01/2021-03:55:54] [I] Inputs:
[06/01/2021-03:55:54] [I] === Reporting Options ===
[06/01/2021-03:55:54] [I] Verbose: Enabled
[06/01/2021-03:55:54] [I] Averages: 10 inferences
[06/01/2021-03:55:54] [I] Percentile: 99
[06/01/2021-03:55:54] [I] Dump output: Disabled
[06/01/2021-03:55:54] [I] Profile: Disabled
[06/01/2021-03:55:54] [I] Export timing to JSON file: 
[06/01/2021-03:55:54] [I] Export output to JSON file: 
[06/01/2021-03:55:54] [I] Export profile to JSON file: 
....
06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681836 ms - Host latency: 0.739844 ms (end to end 1.32114 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682446 ms - Host latency: 0.744922 ms (end to end 1.29099 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681934 ms - Host latency: 0.742871 ms (end to end 1.24248 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.68479 ms - Host latency: 0.745581 ms (end to end 1.26846 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680444 ms - Host latency: 0.739697 ms (end to end 1.31262 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.67854 ms - Host latency: 0.733423 ms (end to end 1.29956 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679321 ms - Host latency: 0.733887 ms (end to end 1.30076 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679028 ms - Host latency: 0.733569 ms (end to end 1.29526 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685083 ms - Host latency: 0.74751 ms (end to end 1.28599 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678809 ms - Host latency: 0.731909 ms (end to end 1.25623 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678076 ms - Host latency: 0.737842 ms (end to end 1.25083 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678101 ms - Host latency: 0.743311 ms (end to end 1.21824 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.67937 ms - Host latency: 0.746265 ms (end to end 1.23091 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678809 ms - Host latency: 0.743091 ms (end to end 1.22178 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680859 ms - Host latency: 0.743774 ms (end to end 1.23784 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.695264 ms - Host latency: 0.756445 ms (end to end 1.2334 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682568 ms - Host latency: 0.741992 ms (end to end 1.22295 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681812 ms - Host latency: 0.738989 ms (end to end 1.22131 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.68291 ms - Host latency: 0.740967 ms (end to end 1.21426 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680786 ms - Host latency: 0.737744 ms (end to end 1.23657 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679248 ms - Host latency: 0.738477 ms (end to end 1.24563 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679907 ms - Host latency: 0.736401 ms (end to end 1.24441 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.67981 ms - Host latency: 0.73689 ms (end to end 1.21582 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678735 ms - Host latency: 0.736035 ms (end to end 1.27935 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683472 ms - Host latency: 0.747021 ms (end to end 1.32166 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679761 ms - Host latency: 0.732813 ms (end to end 1.30002 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679736 ms - Host latency: 0.744946 ms (end to end 1.21824 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678101 ms - Host latency: 0.743384 ms (end to end 1.20908 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679224 ms - Host latency: 0.742212 ms (end to end 1.22161 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682202 ms - Host latency: 0.744629 ms (end to end 1.18691 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682031 ms - Host latency: 0.741699 ms (end to end 1.20525 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683936 ms - Host latency: 0.74165 ms (end to end 1.23748 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679541 ms - Host latency: 0.736279 ms (end to end 1.21372 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680054 ms - Host latency: 0.737158 ms (end to end 1.21018 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679663 ms - Host latency: 0.736279 ms (end to end 1.2532 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.68064 ms - Host latency: 0.739038 ms (end to end 1.21279 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679956 ms - Host latency: 0.737354 ms (end to end 1.24702 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680249 ms - Host latency: 0.73833 ms (end to end 1.22046 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680151 ms - Host latency: 0.737476 ms (end to end 1.24048 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679688 ms - Host latency: 0.736426 ms (end to end 1.22644 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679517 ms - Host latency: 0.765015 ms (end to end 1.25208 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678076 ms - Host latency: 0.733105 ms (end to end 1.26252 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679761 ms - Host latency: 0.737915 ms (end to end 1.18894 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.68186 ms - Host latency: 0.740332 ms (end to end 1.20398 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682886 ms - Host latency: 0.742358 ms (end to end 1.17568 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680249 ms - Host latency: 0.740942 ms (end to end 1.18831 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.687622 ms - Host latency: 0.747168 ms (end to end 1.17183 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.690503 ms - Host latency: 0.749707 ms (end to end 1.17585 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685669 ms - Host latency: 0.745264 ms (end to end 1.17524 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681689 ms - Host latency: 0.74148 ms (end to end 1.21931 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.676562 ms - Host latency: 0.737134 ms (end to end 1.23152 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684863 ms - Host latency: 0.744043 ms (end to end 1.16785 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685034 ms - Host latency: 0.744849 ms (end to end 1.18584 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685327 ms - Host latency: 0.74458 ms (end to end 1.17815 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.687866 ms - Host latency: 0.74729 ms (end to end 1.18259 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685132 ms - Host latency: 0.744385 ms (end to end 1.16345 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683423 ms - Host latency: 0.742725 ms (end to end 1.16917 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683032 ms - Host latency: 0.747192 ms (end to end 1.24456 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684033 ms - Host latency: 0.754639 ms (end to end 1.23228 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.691943 ms - Host latency: 0.762012 ms (end to end 1.24082 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684082 ms - Host latency: 0.756567 ms (end to end 1.23799 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684961 ms - Host latency: 0.756787 ms (end to end 1.25193 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679736 ms - Host latency: 0.746533 ms (end to end 1.20073 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678247 ms - Host latency: 0.745264 ms (end to end 1.19492 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678394 ms - Host latency: 0.745898 ms (end to end 1.18499 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680371 ms - Host latency: 0.745239 ms (end to end 1.18772 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684302 ms - Host latency: 0.74502 ms (end to end 1.19785 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681348 ms - Host latency: 0.737158 ms (end to end 1.23582 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680737 ms - Host latency: 0.738355 ms (end to end 1.22441 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682959 ms - Host latency: 0.740942 ms (end to end 1.2071 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679883 ms - Host latency: 0.736938 ms (end to end 1.22529 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.679199 ms - Host latency: 0.736523 ms (end to end 1.23384 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681079 ms - Host latency: 0.737817 ms (end to end 1.21689 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.680347 ms - Host latency: 0.738647 ms (end to end 1.19302 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.67898 ms - Host latency: 0.736133 ms (end to end 1.25095 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678296 ms - Host latency: 0.747876 ms (end to end 1.20647 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.677173 ms - Host latency: 0.74563 ms (end to end 1.19336 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681982 ms - Host latency: 0.744263 ms (end to end 1.18262 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.753467 ms - Host latency: 0.814331 ms (end to end 1.3624 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.704102 ms - Host latency: 0.764648 ms (end to end 1.21497 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684302 ms - Host latency: 0.744971 ms (end to end 1.22046 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.687378 ms - Host latency: 0.758618 ms (end to end 1.24504 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686914 ms - Host latency: 0.757422 ms (end to end 1.22886 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683496 ms - Host latency: 0.754639 ms (end to end 1.24199 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681079 ms - Host latency: 0.740088 ms (end to end 1.26707 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.678076 ms - Host latency: 0.731763 ms (end to end 1.28528 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681421 ms - Host latency: 0.740234 ms (end to end 1.2165 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686816 ms - Host latency: 0.74375 ms (end to end 1.17727 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683081 ms - Host latency: 0.741675 ms (end to end 1.20007 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684766 ms - Host latency: 0.74602 ms (end to end 1.174 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683008 ms - Host latency: 0.746191 ms (end to end 1.18525 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686816 ms - Host latency: 0.75354 ms (end to end 1.16421 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685278 ms - Host latency: 0.751685 ms (end to end 1.16707 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685376 ms - Host latency: 0.748389 ms (end to end 1.18972 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.684692 ms - Host latency: 0.749902 ms (end to end 1.18191 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.683423 ms - Host latency: 0.749121 ms (end to end 1.1646 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.685742 ms - Host latency: 0.75188 ms (end to end 1.18513 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681519 ms - Host latency: 0.746729 ms (end to end 1.16379 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.681445 ms - Host latency: 0.746313 ms (end to end 1.16665 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682642 ms - Host latency: 0.742603 ms (end to end 1.23538 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686304 ms - Host latency: 0.749146 ms (end to end 1.23372 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686743 ms - Host latency: 0.759888 ms (end to end 1.30112 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.686499 ms - Host latency: 0.758765 ms (end to end 1.29636 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.6823 ms - Host latency: 0.742139 ms (end to end 1.23064 ms)
[06/01/2021-03:56:11] [I] Average on 10 runs - GPU latency: 0.682617 ms - Host latency: 0.741699 ms (end to end 1.24824 ms)
[06/01/2021-03:56:11] [I] Host latency
[06/01/2021-03:56:11] [I] min: 0.724426 ms (end to end 0.739685 ms)
[06/01/2021-03:56:11] [I] max: 1.29987 ms (end to end 1.82343 ms)
[06/01/2021-03:56:11] [I] mean: 0.745126 ms (end to end 1.21566 ms)
[06/01/2021-03:56:11] [I] median: 0.742798 ms (end to end 1.21497 ms)
[06/01/2021-03:56:11] [I] percentile: 0.776611 ms at 99% (end to end 1.35791 ms at 99%)
[06/01/2021-03:56:11] [I] throughput: 1452.63 qps
[06/01/2021-03:56:11] [I] walltime: 3.00214 s
[06/01/2021-03:56:11] [I] GPU Compute
[06/01/2021-03:56:11] [I] min: 0.667664 ms
[06/01/2021-03:56:11] [I] max: 1.24316 ms
[06/01/2021-03:56:11] [I] mean: 0.683192 ms
[06/01/2021-03:56:11] [I] median: 0.680908 ms
[06/01/2021-03:56:11] [I] percentile: 0.702637 ms at 99%
[06/01/2021-03:56:11] [I] total compute time: 2.9794 s
&&&& PASSED TensorRT.trtexec # ./trtexec --deploy=/ModelZoo/mobilenet/mobilenet_deploy.prototxt --output=prob --verbose=true

Thank you

Hi @xidiantuoersuo,

We do not offer API to query how much memory each layer consumes. TRT will always allocate all required memory at once. We can check how much memory we use in verbose log. We print all our block information, scratch space size, persistent memory usage in verbose log.

Could you please let us know which TensorRT version are you using.

Thank you.

Hi, @spolisetty ,
The version I’m using is 7.0

Thank you

@xidiantuoersuo,

Please try using latest TensorRT version to get more information in logs. And as mentioned in previous reply, we do not offer API to query how much memory each layer consumes.

Thank you.

@spolisetty
Thank you for your reply, I will try the latest version.

Thank you