Ask for help about ERROR cudnnEngine.cpp (56) on TX2

I am using TensorRT 3.0.2
Cuda 9.0 and CuDnn 7.0
When I try to do caffe to TRT model on TX2, it throws an error says
memory allocation failure
ERROR: cudnnEngine.cpp (56) - Cuda Error in void nvinfer1::cudnn::initializeCommonContext(nvinfer1::cudnn::CommonContext&): 4
,

I found the error code is ICudaEngine* engine = builder->buildCudaEngine(*network)

my caffemodel is about 235MB,is it too large?

Hello,

GPU on Tegra TX2 do not have their own memory. It is hard wired to the memory controller and shares system RAM. In this case, they share 8GB.

To get tegra GPU status and memory usage, please use the following:

sudo ~/tegrastats --interval 5000

or this handy set of scripts (unsupported)

Hello,
I watched the performance of TX2, and I saw at most 20% of RAM was taken. I have no clue why the error occured because my code works fine on my own Linux server.

Here is some of log during running:34 APE 150 MTS fg 0% bg 0% BCPU@53C MCPU@53C GPU@50.5C PLL@53C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52C VDD_IN 6265/6462 VDD_CPU 952/1070 VDD_GPU 238/218 VDD_SOC 2048/2053 VDD_WIFI 0/69 VDD_DDR 1574/1584
01-29 02:14:02.208 2141 2141 I TegraStats: RAM 1868/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [25%@1915,7%@1881,7%@1881,9%@1880,35%@1882,48%@1882] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@53C MCPU@53C GPU@50.5C PLL@53C Tboard@45C Tdiode@50.25C PMIC@100C thermal@52C VDD_IN 6313/6461 VDD_CPU 1048/1070 VDD_GPU 238/218 VDD_SOC 2048/2053 VDD_WIFI 0/69 VDD_DDR 1593/1584
01-29 02:14:03.220 2141 2141 I TegraStats: RAM 1867/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [18%@1883,6%@1883,3%@1881,11%@1882,41%@1880,15%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@53C MCPU@53C GPU@50C PLL@53C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52C VDD_IN 6527/6461 VDD_CPU 952/1069 VDD_GPU 238/218 VDD_SOC 2048/2053 VDD_WIFI 268/69 VDD_DDR 1593/1584
01-29 02:14:04.230 2141 2141 I TegraStats: RAM 1867/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [14%@1785,6%@1881,1%@1882,11%@1882,38%@1881,7%@1882] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@53C MCPU@53C GPU@50C PLL@53C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52C VDD_IN 6670/6462 VDD_CPU 905/1069 VDD_GPU 190/218 VDD_SOC 2048/2053 VDD_WIFI 384/70 VDD_DDR 1574/1584
01-29 02:14:05.242 2141 2141 I TegraStats: RAM 1867/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [12%@1882,7%@1881,2%@1882,10%@1880,38%@1881,7%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 1%@1134 APE 150 MTS fg 0% bg 0% BCPU@52.5C MCPU@52.5C GPU@50.5C PLL@52.5C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52C VDD_IN 6265/6461 VDD_CPU 905/1069 VDD_GPU 238/218 VDD_SOC 2048/2053 VDD_WIFI 57/70 VDD_DDR 1574/1584
01-29 02:14:06.253 2141 2141 I TegraStats: RAM 1898/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [34%@1881,10%@1881,94%@1882,9%@1880,40%@1880,5%@1882] EMC_FREQ 6%@1866 GR3D_FREQ 3%@1134 APE 150 MTS fg 0% bg 3% BCPU@53.5C MCPU@53.5C GPU@50.5C PLL@53.5C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52.1C VDD_IN 7385/6463 VDD_CPU 1952/1070 VDD_GPU 238/218 VDD_SOC 2095/2053 VDD_WIFI 0/70 VDD_DDR 1651/1584
01-29 02:14:07.264 2141 2141 I TegraStats: RAM 1970/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [54%@1832,7%@1881,99%@1882,10%@1882,38%@1881,10%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 8% BCPU@53.5C MCPU@53.5C GPU@50.5C PLL@53.5C Tboard@45C Tdiode@50.5C PMIC@100C thermal@52.3C VDD_IN 7594/6465 VDD_CPU 2142/1072 VDD_GPU 238/218 VDD_SOC 2095/2054 VDD_WIFI 0/70 VDD_DDR 1670/1584
01-29 02:14:08.275 2141 2141 I TegraStats: RAM 2165/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [26%@1880,10%@1882,100%@1881,7%@1881,56%@1881,18%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@50.5C PLL@54C Tboard@45C Tdiode@50.75C PMIC@100C thermal@52.6C VDD_IN 7761/6468 VDD_CPU 2331/1075 VDD_GPU 237/218 VDD_SOC 2095/2054 VDD_WIFI 0/70 VDD_DDR 1670/1584
01-29 02:14:09.285 2141 2141 I TegraStats: RAM 2336/7456MB (lfb 681x4MB) SWAP 0/512MB (cached 0MB) CPU [15%@1807,13%@1881,98%@1881,13%@1806,46%@1806,33%@1807] EMC_FREQ 7%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 1% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@50.75C PMIC@100C thermal@52.8C VDD_IN 7547/6470 VDD_CPU 2047/1077 VDD_GPU 190/218 VDD_SOC 2095/2054 VDD_WIFI 0/69 VDD_DDR 1728/1584
01-29 02:14:10.298 2141 2141 I TegraStats: RAM 2576/7456MB (lfb 628x4MB) SWAP 0/512MB (cached 0MB) CPU [10%@1881,33%@1881,78%@1882,12%@1881,29%@1881,33%@1882] EMC_FREQ 7%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@50.5C PLL@54C Tboard@45C Tdiode@51C PMIC@100C thermal@52.6C VDD_IN 7499/6472 VDD_CPU 1952/1078 VDD_GPU 238/218 VDD_SOC 2095/2054 VDD_WIFI 0/69 VDD_DDR 1747/1585
01-29 02:14:11.308 2141 2141 I TegraStats: RAM 2763/7456MB (lfb 580x4MB) SWAP 0/512MB (cached 0MB) CPU [14%@1881,17%@1880,92%@1881,12%@1880,10%@1882,34%@1881] EMC_FREQ 7%@1866 GR3D_FREQ 3%@1134 APE 150 MTS fg 0% bg 2% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@51C PMIC@100C thermal@52.8C VDD_IN 7523/6474 VDD_CPU 1999/1080 VDD_GPU 238/218 VDD_SOC 2095/2054 VDD_WIFI 0/69 VDD_DDR 1708/1585
01-29 02:14:12.319 2141 2141 I TegraStats: RAM 2830/7456MB (lfb 593x4MB) SWAP 0/512MB (cached 0MB) CPU [17%@1656,18%@1925,78%@1881,20%@1882,13%@1882,36%@1881] EMC_FREQ 7%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 3% BCPU@54C MCPU@54C GPU@50.5C PLL@54C Tboard@45C Tdiode@51C PMIC@100C thermal@52.8C VDD_IN 7404/6475 VDD_CPU 1952/1082 VDD_GPU 238/218 VDD_SOC 2095/2054 VDD_WIFI 0/69 VDD_DDR 1651/1585
01-29 02:14:13.331 2141 2141 I TegraStats: RAM 2825/7456MB (lfb 691x4MB) SWAP 0/512MB (cached 0MB) CPU [22%@1850,33%@1881,100%@1881,26%@1849,25%@1848,30%@1850] EMC_FREQ 7%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54.5C MCPU@54.5C GPU@51C PLL@54.5C Tboard@45C Tdiode@51C PMIC@100C thermal@52.6C VDD_IN 8451/6479 VDD_CPU 2617/1084 VDD_GPU 237/218 VDD_SOC 2141/2054 VDD_WIFI 0/69 VDD_DDR 1747/1585
01-29 02:14:14.342 2141 2141 I TegraStats: RAM 2832/7456MB (lfb 607x4MB) SWAP 0/512MB (cached 0MB) CPU [55%@1881,71%@1880,84%@1882,22%@1882,39%@1882,13%@1881] EMC_FREQ 7%@1866 GR3D_FREQ 3%@1134 APE 150 MTS fg 0% bg 0% BCPU@54.5C MCPU@54.5C GPU@51C PLL@54.5C Tboard@45C Tdiode@51.25C PMIC@100C thermal@53.1C VDD_IN 8594/6483 VDD_CPU 2664/1087 VDD_GPU 237/218 VDD_SOC 2141/2054 VDD_WIFI 0/69 VDD_DDR 1689/1586
01-29 02:14:15.354 2141 2141 I TegraStats: RAM 2833/7456MB (lfb 589x4MB) SWAP 0/512MB (cached 0MB) CPU [54%@1807,8%@1881,86%@1881,14%@1883,38%@1881,21%@1882] EMC_FREQ 7%@1866 GR3D_FREQ 3%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@51.5C PLL@54C Tboard@45C Tdiode@51.25C PMIC@100C thermal@53.1C VDD_IN 7880/6486 VDD_CPU 1998/1089 VDD_GPU 237/218 VDD_SOC 2095/2054 VDD_WIFI 0/69 VDD_DDR 1632/1586
01-29 02:14:16.365 2141 2141 I TegraStats: RAM 2836/7456MB (lfb 575x4MB) SWAP 0/512MB (cached 0MB) CPU [31%@1786,35%@1880,55%@1881,22%@1881,29%@1881,33%@1880] EMC_FREQ 7%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@51.25C PMIC@100C thermal@53.1C VDD_IN 8070/6489 VDD_CPU 1998/1091 VDD_GPU 237/218 VDD_SOC 2095/2054 VDD_WIFI 172/69 VDD_DDR 1632/1586
01-29 02:14:17.379 2141 2141 I TegraStats: RAM 2836/7456MB (lfb 572x4MB) SWAP 0/512MB (cached 0MB) CPU [27%@1807,40%@1881,10%@1881,26%@1809,16%@1804,35%@1808] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@51.25C PMIC@100C thermal@52.8C VDD_IN 7904/6491 VDD_CPU 1571/1092 VDD_GPU 238/218 VDD_SOC 2095/2054 VDD_WIFI 384/69 VDD_DDR 1632/1586
01-29 02:14:18.391 2141 2141 I TegraStats: RAM 2836/7456MB (lfb 568x4MB) SWAP 0/512MB (cached 0MB) CPU [28%@1882,48%@1881,33%@1882,22%@1881,22%@1880,37%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@51.25C PMIC@100C thermal@53.1C VDD_IN 7999/6494 VDD_CPU 1998/1093 VDD_GPU 237/218 VDD_SOC 2095/2055 VDD_WIFI 96/69 VDD_DDR 1612/1586
01-29 02:14:19.403 2141 2141 I TegraStats: RAM 1869/7456MB (lfb 680x4MB) SWAP 0/512MB (cached 0MB) CPU [38%@1881,23%@1881,50%@1881,19%@1881,24%@1881,34%@1882] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@54C MCPU@54C GPU@51C PLL@54C Tboard@45C Tdiode@51.5C PMIC@100C thermal@52.8C VDD_IN 7975/6497 VDD_CPU 1998/1095 VDD_GPU 237/218 VDD_SOC 2142/2055 VDD_WIFI 0/69 VDD_DDR 1651/1586
01-29 02:14:20.413 2141 2141 I TegraStats: RAM 1869/7456MB (lfb 680x4MB) SWAP 0/512MB (cached 0MB) CPU [15%@1901,6%@1881,46%@1882,15%@1882,12%@1882,34%@1881] EMC_FREQ 6%@1866 GR3D_FREQ 2%@1134 APE 150 MTS fg 0% bg 0% BCPU@53.5C MCPU@53.5C GPU@51C PLL@53.5C Tboard@45C Tdiode@51.5C PMIC@100C thermal@52.5C VDD_IN 6742/6497 VDD_CPU 1238/1095 VDD_GPU 238/218 VDD_SOC 2096/2055 VDD_WIFI 0/69 VDD_DDR 1593/1586

From the log, I see a peak memory usage of

TegraStats: RAM 2830/7456MB (lfb 593x4MB)

or 38% usage with ~2gb of largest free block available. Please note that model size (235MB) isn’t the only variable regarding memory usage. During runtime, TensorRT, parser, etc all require additional memory.

To debug this further, I’d recommend using a smaller model to verify your workflow is correct and rule out model size as an issue. Also keep in mind cudaErrorLaunchFailure = 4

An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. All existing device memory allocations are invalid. To continue using CUDA, the process must be terminated and relaunched.

so this may not be memory size related; this maybe a pointer or handle issue.

Hello, I managed to solve the problem, using 64bit cuda & tensorrtlibraries instead of 32bit