Facebookresearch/maskrcnn-benchmark on Jetson

steps to reproduce the error

$ git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
$ cd maskrcnn-benchmark
$ python3 setup.py install
xpt-relaxed-constexpr --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_72,code=sm_72 -std=c++14
/home/nvidia/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.cu(42): error: identifier "AT_CHECK" is undefined

/home/nvidia/maskrcnn-benchmark/maskrcnn_benchmark/csrc/cuda/deform_pool_cuda.cu(68): error: identifier "AT_CHECK" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_000030f8_00000000-6_deform_pool_cuda.cpp1.ii".
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

ref: https://github.com/facebookresearch/maskrcnn-benchmark/issues/618
retrying with

sudo python3.6 setup.py build develop

same outputs though
more reference

Hi,

This error indicates an application-level issue.
Please check if the below comment can help or not.

Thanks.

@AastaLLL
Thank you for following up
We were able also to implement on Xavier with zed camera the following

also more reference


However, while the proposed patch works
the example constructed on the top of it seems to lack performance dramatically comparing with tensorflow solution refrened above

Hi,

Please check the GPU utilization with tegrastats first.
If the implementation fallback to the CPU, it’s expected to have a dramatic performance drop.

Thanks.

99% gpu use
it works somehow yet slow.
however there is a more updated version named detectron2 d


@AastaLLL next question is how to get detectron to take inputs from CSI sensor rather than usb camera
Wat might be the cause of the error below? I need 32gb Xavier in order to complete the execution?

  )
)
[12/16 14:33:11 fvcore.common.checkpoint]: Loading checkpoint from meshrcnn://meshrcnn_R50.pth
[12/16 14:33:12 meshrcnn.data.meshrcnn_transforms]: Loading models from pix3d_s1_test...
[12/16 14:39:55 meshrcnn.data.meshrcnn_transforms]: Unique objects loaded: 735
[12/16 14:39:55 meshrcnn.data.datasets.pix3d]: Loaded 2530 images in COCO format from datasets/pix3d/pix3d_s1_test.json
[12/16 14:39:55 d2.data.build]: Distribution of instances among all 9 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|    bed     | 213          |  bookcase  | 79           |   chair    | 1165         |
|    desk    | 154          |    misc    | 20           |    sofa    | 415          |
|   table    | 419          |    tool    | 11           |  wardrobe  | 54           |
|            |              |            |              |            |              |
|   total    | 2530         |            |              |            |              |
[12/16 14:39:55 d2.data.common]: Serializing 2530 elements to byte tensors and concatenating them all ...
[12/16 14:39:55 d2.data.common]: Serialized dataset takes 1.36 MiB
[12/16 14:39:55 meshrcnn.evaluation.pix3d_evaluation]: Loading unique objects from pix3d_s1_test...
[12/16 14:46:25 meshrcnn.evaluation.pix3d_evaluation]: Unique objects loaded: 735
[12/16 14:46:25 d2.evaluation.evaluator]: Start inference on 2530 images
/home/nvidia/detectron2/detectron2/modeling/roi_heads/fast_rcnn.py:124: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  ../torch/csrc/utils/python_arg_parser.cpp:882.)
  filter_inds = filter_mask.nonzero()
 python3 tools/train_net.py --config-file configs/pix3d/meshrcnn_R50_FPN.yaml --eval-only MODEL.WEIGHTS meshrcnn://meshrcnn_R50.pth
** fvcore version of PathManager will be deprecated soon. **
** Please migrate to the version in iopath repo. **
https://github.com/facebookresearch/iopath 

Command Line Args: Namespace(config_file='configs/pix3d/meshrcnn_R50_FPN.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=True, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.WEIGHTS', 'meshrcnn://meshrcnn_R50.pth'], resume=False)
[12/16 14:33:03 detectron2]: Rank of current process: 0. World size: 1
[12/16 14:33:05 detectron2]: Environment info:
----------------------  -----------------------------------------------------------------------------------------------------------------------
sys.platform            linux
Python                  3.6.9 (default, Oct  8 2020, 12:12:24) [GCC 8.4.0]
numpy                   1.18.5
detectron2              0.3 @/home/nvidia/detectron2/detectron2
Compiler                GCC 7.5
CUDA compiler           CUDA 10.2
detectron2 arch flags   7.2
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.7.0 @/home/nvidia/.local/lib/python3.6/site-packages/torch
PyTorch debug build     True
GPU available           True
GPU 0                   Xavier (arch=7.2)
CUDA_HOME               /usr/local/cuda-10.2
Pillow                  7.2.0
torchvision             0.8.0a0+45f960c @/usr/local/lib/python3.6/dist-packages/torchvision-0.8.0a0+45f960c-py3.6-linux-aarch64.egg/torchvision
torchvision arch flags  7.2
fvcore                  0.1.2.post20201216
cv2                     4.4.0
----------------------  -----------------------------------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.0
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, USE_CUDA=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, 

[12/16 14:51:33 d2.evaluation.evaluator]: Inference done 397/2530. 0.5933 s / img. ETA=0:27:13
[12/16 14:51:40 d2.evaluation.evaluator]: Inference done 406/2530. 0.5946 s / img. ETA=0:27:07
[12/16 14:51:45 d2.evaluation.evaluator]: Inference done 412/2530. 0.5968 s / img. ETA=0:27:05
[12/16 14:51:52 d2.evaluation.evaluator]: Inference done 416/2530. 0.5976 s / img. ETA=0:27:25
[12/16 14:51:59 d2.evaluation.evaluator]: Inference done 424/2530. 0.5989 s / img. ETA=0:27:19
[12/16 14:52:04 d2.evaluation.evaluator]: Inference done 433/2530. 0.5984 s / img. ETA=0:27:04
[12/16 14:52:09 d2.evaluation.evaluator]: Inference done 438/2530. 0.5986 s / img. ETA=0:27:06
[12/16 14:52:14 d2.evaluation.evaluator]: Inference done 443/2530. 0.6019 s / img. ETA=0:27:07
[12/16 14:52:19 d2.evaluation.evaluator]: Inference done 450/2530. 0.6030 s / img. ETA=0:27:00
[12/16 14:52:25 d2.evaluation.evaluator]: Inference done 458/2530. 0.6036 s / img. ETA=0:26:50
[12/16 14:52:30 d2.evaluation.evaluator]: Inference done 465/2530. 0.6055 s / img. ETA=0:26:44
[12/16 14:52:36 d2.evaluation.evaluator]: Inference done 471/2530. 0.6081 s / img. ETA=0:26:48
[12/16 14:52:42 d2.evaluation.evaluator]: Inference done 477/2530. 0.6107 s / img. ETA=0:26:48
[12/16 14:52:48 d2.evaluation.evaluator]: Inference done 483/2530. 0.6140 s / img. ETA=0:26:47
[12/16 14:52:53 d2.evaluation.evaluator]: Inference done 487/2530. 0.6163 s / img. ETA=0:26:54
[12/16 14:53:00 d2.evaluation.evaluator]: Inference done 491/2530. 0.6217 s / img. ETA=0:27:04
[12/16 14:53:06 d2.evaluation.evaluator]: Inference done 493/2530. 0.6300 s / img. ETA=0:27:21
[12/16 14:53:11 d2.evaluation.evaluator]: Inference done 497/2530. 0.6347 s / img. ETA=0:27:25
[12/16 14:53:21 d2.evaluation.evaluator]: Inference done 500/2530. 0.6371 s / img. ETA=0:27:55
[12/16 14:53:27 d2.evaluation.evaluator]: Inference done 503/2530. 0.6398 s / img. ETA=0:28:06
[12/16 14:53:32 d2.evaluation.evaluator]: Inference done 509/2530. 0.6422 s / img. ETA=0:28:03
[12/16 14:53:37 d2.evaluation.evaluator]: Inference done 514/2530. 0.6451 s / img. ETA=0:28:02
[12/16 14:53:43 d2.evaluation.evaluator]: Inference done 519/2530. 0.6490 s / img. ETA=0:28:03
[12/16 14:53:49 d2.evaluation.evaluator]: Inference done 524/2530. 0.6532 s / img. ETA=0:28:05
[12/16 14:53:54 d2.evaluation.evaluator]: Inference done 528/2530. 0.6575 s / img. ETA=0:28:09
[12/16 14:53:59 d2.evaluation.evaluator]: Inference done 532/2530. 0.6621 s / img. ETA=0:28:13
Traceback (most recent call last):
  File "tools/train_net.py", line 108, in <module>
    args=(args,),
  File "/home/nvidia/detectron2/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "tools/train_net.py", line 91, in main
    res = Trainer.test(cfg, model)
  File "tools/train_net.py", line 60, in test
    results_i = inference_on_dataset(model, data_loader, evaluator)
  File "/home/nvidia/detectron2/detectron2/evaluation/evaluator.py", line 141, in inference_on_dataset
    outputs = model(inputs)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 149, in forward
    return self.inference(batched_inputs)
  File "/home/nvidia/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 207, in inference
    proposals, _ = self.proposal_generator(images, features, None)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 449, in forward
    anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
  File "/home/nvidia/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 482, in predict_proposals
    self.training,
  File "/home/nvidia/detectron2/detectron2/modeling/proposal_generator/proposal_utils.py", line 104, in find_top_rpn_proposals
    keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
  File "/home/nvidia/detectron2/detectron2/layers/nms.py", line 21, in batched_nms
    return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/jit/_trace.py", line 1100, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.8.0a0+45f960c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 88, in batched_nms
    keep = nms(boxes_for_nms, scores, iou_threshold)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.8.0a0+45f960c-py3.6-linux-aarch64.egg/torchvision/ops/boxes.py", line 42, in nms
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
  File "/home/nvidia/.local/lib/python3.6/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 19139) is killed by signal: Killed.

Hi,

I check the repository input quickly.
It seems it uses OpenCV as a camera reader.

This mechanism explains why the pipeline is much slower since OpenCV, by default, use CPU-based FFmpeg as a decoder.
You can replace it with a Gstreamer pipeline like the below topic:
https://forums.developer.nvidia.com/t/opencv-videocapture-performance-problem-with-gstreamer/71515/2

And the GStreamer pipeline can support CSI input with the nvarguscamerasrc component directly.

Thanks.

1 Like

thanks.
it worked with CSI after editing the line 67
to the form below

    cam = cv2.VideoCapture(nvarguscamerasrc ! video/x-raw(memory:NVMM), width=1280, height=720, framerate=30/1 !  nvvidconv ! video/x-raw,format=BGRx ! videoconvert ! video/x-raw, format=BGR ! appsink)