runtime error (7) : too many resources requested for launch at pytorch/aten/src/THC/THCTensorSort.cu...

THCudaCheck FAIL file=/home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu line=62 error=7 : too many resources requested for launch

when I run the maskrcnn (project link: https://github.com/facebookresearch/maskrcnn-benchmark) inference demo on nvidia jetson TX2, and it occurs the runtime error (7). by the way, I also test the YOLO v3 network on TX2, which works well.

  • PyTorch : torch-1.1.0a0+7c66ad7
  • install PyTorch : from source
  • Python version: 3.5
  • CUDA/cuDNN version: cuda9.0 and cudnn7.1.5 (original from TX2 jetpack 3.3)

error log:

THCudaCheck FAIL file=/home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu line=62 error=7 : too many resources requested for launch
Traceback (most recent call last):
File “demo.py”, line 88, in
predictions = coco_demo.run_on_opencv_image(image)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/demo/predictor.py”, line 93, in run_on_opencv_image
predictions = self.compute_prediction(image)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/demo/predictor.py”, line 124, in compute_prediction
predictions = self.model(image_list)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py”, line 50, in forward
proposals, proposal_losses = self.rpn(images, features, targets)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/rpn.py”, line 96, in forward
return self._forward_test(anchors, objectness, rpn_box_regression)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/rpn.py”, line 122, in _forward_test
boxes = self.box_selector_test(anchors, objectness, rpn_box_regression)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 492, in call
result = self.forward(*input, **kwargs)
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/inference.py”, line 138, in forward
sampled_boxes.append(self.forward_for_single_feature_map(a, o, b))
File “/home/nvidia/Documents/maskrcnn-benchmark-master/maskrcnn_benchmark/modeling/rpn/inference.py”, line 93, in forward_for_single_feature_map
objectness, topk_idx = objectness.topk(pre_nms_top_n, dim=1, sorted=True)
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/Documents/pytorch/aten/src/THC/THCTensorSort.cu:62

Hi,

Please check if you are running out of memory.

sudo ./tegrastats

If yes, please try to decrease the batch size of mask-rcnn or use a smaller model.

Thanks.

Hi
thanks for your quick reply, I use the command

sudo ./tegrastats

to monitor GPU status while the python program process. it seems TX2 RAM is enough for model inferece. Btw, the batch size is 1 while inference.

RAM 1761/7846MB (lfb 811x4MB) CPU [0%@498,0%@499,0%@499,0%@499,0%@498,0%@499] EMC_FREQ 5%@408 GR3D_FREQ 12%@140 APE 150 MTS fg 0% bg 3% BCPU@31C MCPU@31C GPU@29C PLL@31C Tboard@28C Tdiode@29.25C PMIC@100C thermal@30.2C VDD_IN 1744/1744 VDD_CPU 147/147 VDD_GPU 49/49 VDD_SOC 294/294 VDD_WIFI 287/287 VDD_DDR 287/287
RAM 1761/7846MB (lfb 811x4MB) CPU [15%@806,4%@345,4%@345,17%@806,31%@807,23%@806] EMC_FREQ 5%@408 GR3D_FREQ 17%@140 APE 150 MTS fg 0% bg 0% BCPU@31C MCPU@31C GPU@29C PLL@31C Tboard@28C Tdiode@29.25C PMIC@100C thermal@30.2C VDD_IN 1940/1842 VDD_CPU 196/171 VDD_GPU 98/73 VDD_SOC 343/318 VDD_WIFI 267/277 VDD_DDR 382/334
RAM 1766/7846MB (lfb 811x4MB) CPU [20%@2034,3%@345,8%@345,9%@2032,11%@2034,9%@2034] EMC_FREQ 1%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@31C MCPU@31C GPU@29C PLL@31C Tboard@28C Tdiode@29C PMIC@100C thermal@30.2C VDD_IN 2063/1915 VDD_CPU 245/196 VDD_GPU 49/65 VDD_SOC 343/326 VDD_WIFI 306/286 VDD_DDR 420/363
RAM 1795/7846MB (lfb 811x4MB) CPU [33%@345,96%@1981,27%@1980,29%@345,29%@346,30%@345] EMC_FREQ 1%@1866 GR3D_FREQ 12%@140 APE 150 MTS fg 2% bg 5% BCPU@32C MCPU@32C GPU@29.5C PLL@32C Tboard@28C Tdiode@29.5C PMIC@100C thermal@30.2C VDD_IN 5168/2728 VDD_CPU 1908/624 VDD_GPU 48/61 VDD_SOC 783/440 VDD_WIFI 325/296 VDD_DDR 1279/592
RAM 1832/7846MB (lfb 811x4MB) CPU [5%@345,33%@2011,66%@2010,4%@346,3%@345,5%@345] EMC_FREQ 1%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 11% BCPU@32C MCPU@32C GPU@29.5C PLL@32C Tboard@28C Tdiode@29.75C PMIC@100C thermal@31C VDD_IN 5021/3187 VDD_CPU 1663/831 VDD_GPU 48/58 VDD_SOC 832/519 VDD_WIFI 306/298 VDD_DDR 1337/741
RAM 1977/7846MB (lfb 811x4MB) CPU [4%@1421,2%@1979,100%@1983,7%@1420,17%@1420,5%@1420] EMC_FREQ 1%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 5% BCPU@33C MCPU@33C GPU@30C PLL@33C Tboard@28C Tdiode@30C PMIC@100C thermal@31C VDD_IN 4997/3488 VDD_CPU 1712/978 VDD_GPU 48/56 VDD_SOC 832/571 VDD_WIFI 267/293 VDD_DDR 1318/837
RAM 2084/7846MB (lfb 811x4MB) CPU [4%@345,72%@2034,100%@2034,17%@345,18%@345,2%@345] EMC_FREQ 1%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 3% bg 0% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@28C Tdiode@30C PMIC@100C thermal@31.5C VDD_IN 5875/3829 VDD_CPU 2444/1187 VDD_GPU 48/55 VDD_SOC 831/608 VDD_WIFI 287/292 VDD_DDR 1375/914
RAM 2168/7846MB (lfb 811x4MB) CPU [6%@345,98%@2005,97%@2009,3%@345,5%@345,4%@345] EMC_FREQ 1%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 5% bg 0% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@28C Tdiode@30.25C PMIC@100C thermal@31.8C VDD_IN 5997/4100 VDD_CPU 2493/1351 VDD_GPU 48/54 VDD_SOC 831/636 VDD_WIFI 287/291 VDD_DDR 1413/976
RAM 2297/7846MB (lfb 811x4MB) CPU [8%@345,92%@1981,95%@1980,2%@345,7%@345,6%@345] EMC_FREQ 2%@1866 GR3D_FREQ 15%@140 APE 150 MTS fg 11% bg 1% BCPU@33C MCPU@33C GPU@30C PLL@33C Tboard@29C Tdiode@30.25C PMIC@100C thermal@31.8C VDD_IN 5752/4284 VDD_CPU 2249/1450 VDD_GPU 97/59 VDD_SOC 880/663 VDD_WIFI 191/280 VDD_DDR 1471/1031
RAM 2508/7846MB (lfb 793x4MB) CPU [7%@345,3%@1976,96%@1980,4%@345,4%@345,19%@345] EMC_FREQ 2%@1866 GR3D_FREQ 28%@140 APE 150 MTS fg 0% bg 2% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@29C Tdiode@30.5C PMIC@100C thermal@31.5C VDD_IN 4874/4343 VDD_CPU 1615/1467 VDD_GPU 97/63 VDD_SOC 832/680 VDD_WIFI 172/269 VDD_DDR 1375/1065
RAM 2677/7846MB (lfb 762x4MB) CPU [4%@345,0%@2035,97%@2034,1%@345,4%@345,1%@345] EMC_FREQ 2%@1866 GR3D_FREQ 34%@140 APE 150 MTS fg 0% bg 5% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@29C Tdiode@30.25C PMIC@100C thermal@31.7C VDD_IN 4874/4391 VDD_CPU 1566/1476 VDD_GPU 97/66 VDD_SOC 832/693 VDD_WIFI 172/260 VDD_DDR 1375/1093
RAM 2783/7846MB (lfb 724x4MB) CPU [20%@345,4%@345,66%@345,12%@346,12%@345,8%@346] EMC_FREQ 11%@408 GR3D_FREQ 29%@140 APE 150 MTS fg 0% bg 1% BCPU@32C MCPU@32C GPU@30C PLL@32C Tboard@29C Tdiode@30.25C PMIC@100C thermal@31.2C VDD_IN 3433/4311 VDD_CPU 735/1414 VDD_GPU 98/68 VDD_SOC 588/685 VDD_WIFI 267/261 VDD_DDR 880/1076
RAM 3046/7846MB (lfb 667x4MB) CPU [16%@345,2%@2035,54%@2035,22%@345,9%@345,8%@345] EMC_FREQ 3%@1866 GR3D_FREQ 4%@140 APE 150 MTS fg 0% bg 0% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@29C Tdiode@30.25C PMIC@100C thermal@31.5C VDD_IN 4534/4328 VDD_CPU 1224/1399 VDD_GPU 97/70 VDD_SOC 783/692 VDD_WIFI 267/261 VDD_DDR 1337/1096
RAM 2629/7846MB (lfb 719x4MB) CPU [10%@345,23%@2010,69%@2012,5%@345,6%@345,7%@345] EMC_FREQ 3%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 13% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@29C Tdiode@30.25C PMIC@100C thermal@31.5C VDD_IN 4948/4372 VDD_CPU 1614/1415 VDD_GPU 97/72 VDD_SOC 832/702 VDD_WIFI 210/257 VDD_DDR 1356/1114
RAM 2791/7846MB (lfb 719x4MB) CPU [7%@345,38%@2034,58%@2035,7%@345,14%@345,6%@345] EMC_FREQ 3%@1866 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 7% BCPU@32.5C MCPU@32.5C GPU@30.5C PLL@32.5C Tboard@29C Tdiode@30.5C PMIC@100C thermal@31.8C VDD_IN 5095/4421 VDD_CPU 1663/1431 VDD_GPU 97/74 VDD_SOC 832/711 VDD_WIFI 287/259 VDD_DDR 1375/1132
RAM 3001/7846MB (lfb 679x4MB) CPU [7%@345,0%@1977,96%@1980,4%@345,2%@345,11%@345] EMC_FREQ 3%@1866 GR3D_FREQ 10%@140 APE 150 MTS fg 0% bg 3% BCPU@32.5C MCPU@32.5C GPU@30C PLL@32.5C Tboard@29C Tdiode@30.5C PMIC@100C thermal@31.7C VDD_IN 4997/4457 VDD_CPU 1565/1439 VDD_GPU 97/75 VDD_SOC 832/718 VDD_WIFI 267/260 VDD_DDR 1413/1149
RAM 1875/7846MB (lfb 770x4MB) CPU [20%@2034,2%@345,86%@345,19%@2035,14%@2034,13%@2035] EMC_FREQ 9%@1866 GR3D_FREQ 0%@318 APE 150 MTS fg 0% bg 5% BCPU@33C MCPU@33C GPU@30.5C PLL@33C Tboard@29C Tdiode@31C PMIC@100C thermal@31.7C VDD_IN 6869/4598 VDD_CPU 1514/1444 VDD_GPU 1368/151 VDD_SOC 977/733 VDD_WIFI 287/261 VDD_DDR 1814/1188
RAM 1772/7846MB (lfb 805x4MB) CPU [22%@345,7%@345,9%@345,16%@345,7%@345,13%@344] EMC_FREQ 26%@408 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@32C MCPU@32C GPU@30C PLL@32C Tboard@29C Tdiode@30.5C PMIC@100C thermal@31.7C VDD_IN 2601/4487 VDD_CPU 343/1383 VDD_GPU 196/154 VDD_SOC 441/717 VDD_WIFI 287/263 VDD_DDR 612/1156
RAM 1772/7846MB (lfb 805x4MB) CPU [14%@345,1%@345,4%@345,2%@345,3%@345,14%@345] EMC_FREQ 15%@408 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@31.5C MCPU@31.5C GPU@30C PLL@31.5C Tboard@29C Tdiode@30C PMIC@100C thermal@31.2C VDD_IN 1866/4349 VDD_CPU 147/1318 VDD_GPU 98/151 VDD_SOC 343/697 VDD_WIFI 306/265 VDD_DDR 344/1113

the problem has solved by pytorch team.it will be fixed in the later edition.
https://github.com/pytorch/pytorch/issues/17144

Thanks for your information.