SiamMask on Jetson Xavier NX, pytorch, slow FPS

gbenel · June 22, 2020, 9:58pm

I am running SiamMask (GitHub - foolwood/SiamMask: [CVPR2019] Fast Online Object Tracking and Segmentation: A Unifying Approach) on a Jetson Xavier NX, using pytorch.

I currently am getting a frame rate of about 5 FPS, I was hoping for much faster.

I know the pytorch would be slower than tensorRT but I was still expecting a much faster rate. It seems like it may only be using the CPU, not GPU.

Could anyone help me with this or give me some insight?

I installed pytorch through the nvidia too in jetson-inference/build. I am showing version 1.4.0.

Thanks!

AastaLLL · June 23, 2020, 3:32am

Hi,

You can monitor the system status with tegrastats at the same time:

$ sudo tegrastats

If the GR3D utilization doesn’t reach 99%, it indicates that your application still has zoom for acceleration.

RAM 2275/7764MB (lfb 729x4MB) … GR3D_FREQ 0%@1109 …

Another thing worthy to check is that system clock and device performance.
You can try to apply the following configure to turn on all the CPU and maximize the clock rate.

$ sudo nvpmodel -m 2
$ sudo jetson_clocks

Thanks.

gbenel · June 23, 2020, 2:29pm

GR3D_FREQ ranges from 77 to 99%.

CPU and clock rates maximized.

AastaLLL · June 24, 2020, 3:15am

Hi,

It looks like the GPU utilization is quite high.

A better way is to convert your model into TensorRT.
TensorRT has optimized for Jetson platform so it will be a better choice for Jetson inference.

How about to convert your model into onnx format and give a quite try:

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]

This will give you a better idea of the XavierNX computational capacity.

Thanks.

gbenel · June 24, 2020, 4:21pm

I am trying that, and will update this thread with results.

I am not totally sure how I will then employ the model with the SiamMask code, I could use some guidance on that.

gbenel · July 1, 2020, 4:45pm

I tried converting to ONNX and got the following error:

$ /usr/src/tensorrt/bin/trtexec --onnx=model.pth
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
[07/01/2020-12:43:58] [I] === Model Options ===
[07/01/2020-12:43:58] [I] Format: ONNX
[07/01/2020-12:43:58] [I] Model: model.pth
[07/01/2020-12:43:58] [I] Output:
[07/01/2020-12:43:58] [I] === Build Options ===
[07/01/2020-12:43:58] [I] Max batch: 1
[07/01/2020-12:43:58] [I] Workspace: 16 MB
[07/01/2020-12:43:58] [I] minTiming: 1
[07/01/2020-12:43:58] [I] avgTiming: 8
[07/01/2020-12:43:58] [I] Precision: FP32
[07/01/2020-12:43:58] [I] Calibration: 
[07/01/2020-12:43:58] [I] Safe mode: Disabled
[07/01/2020-12:43:58] [I] Save engine: 
[07/01/2020-12:43:58] [I] Load engine: 
[07/01/2020-12:43:58] [I] Builder Cache: Enabled
[07/01/2020-12:43:58] [I] NVTX verbosity: 0
[07/01/2020-12:43:58] [I] Inputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Outputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Input build shapes: model
[07/01/2020-12:43:58] [I] Input calibration shapes: model
[07/01/2020-12:43:58] [I] === System Options ===
[07/01/2020-12:43:58] [I] Device: 0
[07/01/2020-12:43:58] [I] DLACore: 
[07/01/2020-12:43:58] [I] Plugins:
[07/01/2020-12:43:58] [I] === Inference Options ===
[07/01/2020-12:43:58] [I] Batch: 1
[07/01/2020-12:43:58] [I] Input inference shapes: model
[07/01/2020-12:43:58] [I] Iterations: 10
[07/01/2020-12:43:58] [I] Duration: 3s (+ 200ms warm up)
[07/01/2020-12:43:58] [I] Sleep time: 0ms
[07/01/2020-12:43:58] [I] Streams: 1
[07/01/2020-12:43:58] [I] ExposeDMA: Disabled
[07/01/2020-12:43:58] [I] Spin-wait: Disabled
[07/01/2020-12:43:58] [I] Multithreading: Disabled
[07/01/2020-12:43:58] [I] CUDA Graph: Disabled
[07/01/2020-12:43:58] [I] Skip inference: Disabled
[07/01/2020-12:43:58] [I] Inputs:
[07/01/2020-12:43:58] [I] === Reporting Options ===
[07/01/2020-12:43:58] [I] Verbose: Disabled
[07/01/2020-12:43:58] [I] Averages: 10 inferences
[07/01/2020-12:43:58] [I] Percentile: 99
[07/01/2020-12:43:58] [I] Dump output: Disabled
[07/01/2020-12:43:58] [I] Profile: Disabled
[07/01/2020-12:43:58] [I] Export timing to JSON file: 
[07/01/2020-12:43:58] [I] Export output to JSON file: 
[07/01/2020-12:43:58] [I] Export profile to JSON file: 
[07/01/2020-12:43:58] [I] 
----------------------------------------------------------------
Input filename:   model.pth
ONNX IR version:  0.0.0
Opset version:    0
Producer name:    
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/01/2020-12:44:00] [E] [TRT] Network must have at least one output
[07/01/2020-12:44:00] [E] [TRT] Network validation failed.
[07/01/2020-12:44:00] [E] Engine creation failed
[07/01/2020-12:44:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth

gbenel · July 12, 2020, 3:28am

I am trying this code (from: pysot model.pth to onnx occurred error · Issue #125 · STVIR/pysot · GitHub ) to convert a siamRPN (not siamMASK model)
It looks like an ONNX model was created, but I’m not so sure it really worked. Converting to TRT fails, as shown after this code.

import torch
import torchvision.models as models
import torch.nn as nn
import argparse
import torchvision
import torch.onnx

from collections import OrderedDict
from pysot.core.config import cfg
from pysot.models.model_builder import ModelBuilder

parser = argparse.ArgumentParser(description='trans demo')
parser.add_argument('--config', type=str, help='config file')
parser.add_argument('--snapshot', type=str, help='model name')
args = parser.parse_args()

cfg.merge_from_file(args.config)
cfg.CUDA = torch.cuda.is_available() and cfg.CUDA
device = torch.device('cuda' if cfg.CUDA else 'cpu')

class ConvertModel(nn.Module):
    def __init__(self, model0):
        super(ConvertModel, self).__init__()
        self.model = model0

    def forward(self, template, search):
        zf = self.model.backbone(template)
        xf = self.model.backbone(search)
        cls, loc = self.model.rpn_head(zf, xf)
        return cls, loc


model0 = ModelBuilder()
model0.load_state_dict(torch.load(args.snapshot, map_location=lambda storage, loc: storage.cpu()))

model0.eval()
print(model0)
model = ConvertModel(model0)

x = torch.randn(1, 3, 127, 127)
z = torch.randn(1, 3, 287, 287)

torch_out = torch.onnx._export(model, (x,z), "model.onnx", export_params=True)

The file “model.onnx” was created.

    /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
    &&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
    [07/11/2020-22:13:09] [I] === Model Options ===
    [07/11/2020-22:13:09] [I] Format: ONNX
    [07/11/2020-22:13:09] [I] Model: model.onnx
    [07/11/2020-22:13:09] [I] Output:
    [07/11/2020-22:13:09] [I] === Build Options ===
    [07/11/2020-22:13:09] [I] Max batch: 1
    [07/11/2020-22:13:09] [I] Workspace: 16 MB
    [07/11/2020-22:13:09] [I] minTiming: 1
    [07/11/2020-22:13:09] [I] avgTiming: 8
    [07/11/2020-22:13:09] [I] Precision: FP32
    [07/11/2020-22:13:09] [I] Calibration: 
    [07/11/2020-22:13:09] [I] Safe mode: Disabled
    [07/11/2020-22:13:09] [I] Save engine: 
    [07/11/2020-22:13:09] [I] Load engine: 
    [07/11/2020-22:13:09] [I] Builder Cache: Enabled
    [07/11/2020-22:13:09] [I] NVTX verbosity: 0
    [07/11/2020-22:13:09] [I] Inputs format: fp32:CHW
    [07/11/2020-22:13:09] [I] Outputs format: fp32:CHW
    [07/11/2020-22:13:09] [I] Input build shapes: model
    [07/11/2020-22:13:09] [I] Input calibration shapes: model
    [07/11/2020-22:13:09] [I] === System Options ===
    [07/11/2020-22:13:09] [I] Device: 0
    [07/11/2020-22:13:09] [I] DLACore: 
    [07/11/2020-22:13:09] [I] Plugins:
    [07/11/2020-22:13:09] [I] === Inference Options ===
    [07/11/2020-22:13:09] [I] Batch: 1
    [07/11/2020-22:13:09] [I] Input inference shapes: model
    [07/11/2020-22:13:09] [I] Iterations: 10
    [07/11/2020-22:13:09] [I] Duration: 3s (+ 200ms warm up)
    [07/11/2020-22:13:09] [I] Sleep time: 0ms
    [07/11/2020-22:13:09] [I] Streams: 1
    [07/11/2020-22:13:09] [I] ExposeDMA: Disabled
    [07/11/2020-22:13:09] [I] Spin-wait: Disabled
    [07/11/2020-22:13:09] [I] Multithreading: Disabled
    [07/11/2020-22:13:09] [I] CUDA Graph: Disabled
    [07/11/2020-22:13:09] [I] Skip inference: Disabled
    [07/11/2020-22:13:09] [I] Inputs:
    [07/11/2020-22:13:09] [I] === Reporting Options ===
    [07/11/2020-22:13:09] [I] Verbose: Disabled
    [07/11/2020-22:13:09] [I] Averages: 10 inferences
    [07/11/2020-22:13:09] [I] Percentile: 99
    [07/11/2020-22:13:09] [I] Dump output: Disabled
    [07/11/2020-22:13:09] [I] Profile: Disabled
    [07/11/2020-22:13:09] [I] Export timing to JSON file: 
    [07/11/2020-22:13:09] [I] Export output to JSON file: 
    [07/11/2020-22:13:09] [I] Export profile to JSON file: 
    [07/11/2020-22:13:09] [I] 
    ----------------------------------------------------------------
    Input filename:   model.onnx
    ONNX IR version:  0.0.4
    Opset version:    9
    Producer name:    pytorch
    Producer version: 1.3
    Domain:           
    Model version:    0
    Doc string:       
    ----------------------------------------------------------------
    [07/11/2020-22:13:11] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
    ERROR: onnx2trt_utils.cpp:1523 In function convMultiInput:
    [8] Assertion failed: filter_dim.d[nbSpatialDims - i] == kernel_tensor_ptr->getDimensions().d[kernel_tensor_ptr->getDimensions().nbDims - i]
    [07/11/2020-22:13:11] [E] Failed to parse onnx file
    [07/11/2020-22:13:11] [E] Parsing model failed
    [07/11/2020-22:13:11] [E] Engine creation failed
    [07/11/2020-22:13:11] [E] Engine set up failed
    &&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx

AastaLLL · July 15, 2020, 5:10am

Hi,

Please noticed that you will need to generate the onnx format to feed the model into TensorRT.
You can check this page to convert the .pth file into .onnx first:
https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html

Thanks.

gbenel · July 15, 2020, 10:50am

Hi @AastaLLL please read through the post above, I did create the onnx model.

AastaLLL · July 16, 2020, 2:55am

Hi,

Sorry for that.
The assertion occurs from here:

github.com

onnx/onnx-tensorrt/blob/main/onnx2trt_utils.cpp#L1739


      
              {
                  return &tensor;
              }
              nvinfer1::IShuffleLayer* layer = ctx->network()->addShuffle(tensor);
              if (!layer)
              {
                  return nullptr;
              }
              layer->setReshapeDimensions(shape);
              layer->setZeroIsPlaceholder(false);
              return layer->getOutput(0);
          }
          
          
NodeImportResult scaleHelper(IImporterContext* ctx, const ::ONNX_NAMESPACE::NodeProto& node, nvinfer1::ITensor& tensor_,
              nvinfer1::ScaleMode mode, const nvinfer1::Weights& shift, const nvinfer1::Weights& scale,
              const nvinfer1::Weights& power, const char* shiftName, const char* scaleName)
          {
              nvinfer1::ITensor* tensorPtr = &tensor_;
              const ShapeTensor origShape = shapeOf(*tensorPtr);
          
          
    // TensorRT scale layers support 4D(NCHW) or 5D(NCDHW) input.

This indicates that the siamRPN includes some non-supported operations.
More precisely, it looks like that the spatial dimension and the kernel dimension cannot match in certain operation.

Here is the detail support matrix for your reference:
https://github.com/onnx/onnx-tensorrt/blob/master/operators.md

Thanks.

gbenel · July 16, 2020, 1:14pm

OK - so you think no way to convert this model to TensorRT currently?

AastaLLL · July 17, 2020, 2:14am

Hi,

We don’t have too much experience on SiamMask.
It’s recommended to check if there is any network architecture that cannot be supported by TensorRT or onnx2trt first.

Thanks.

gbenel · July 18, 2020, 2:00pm

Do you have any recommendations for Single Object object trackers (preferably user-selectable ROI) that are optimized for Jetson?
I have experimented with some of the trackers on deepstream, but they are all object-detection based and multi-object tracking, so this leads to a number of issues, mostly ID switching related.

AastaLLL · July 29, 2020, 6:03am

Hi,

You can check if the “KLT Bounding Box Tracker” in our VPI SDK can meet your requirement.
https://docs.nvidia.com/vpi/algo_klt_tracker.html
It use the same tracking algorithm but takes the bounding box as input so can be customized.

Thanks.

gbenel · July 29, 2020, 11:47am

Thank you! Is there any python implementation of this for Jetson?

AastaLLL · July 30, 2020, 4:51am

Hi,

Sorry that our VPI library only have C++ API currently.
Thanks.

gbenel · July 30, 2020, 4:58pm

I see you need to install the VPI through the SDK manager, as installing VPI through apt is only possible for linux-x86_64 systems (not aarch64).
Unfortunately I do not have a linux machine to run the SDK manager on, and you cannot run the SDK manager on the jetson device itself.

AastaLLL · July 31, 2020, 4:35am

Hi,

VPI is installed when you flash NX with sdkmanager.
You can find the sample on device here:

/opt/nvidia/vpi-0.3/samples

Thanks.

gbenel · July 31, 2020, 1:35pm

Found it, thank you!

gbenel · July 31, 2020, 3:24pm

The KLT tracker sample looks and runs great - but it is difficult for me to access only being in C++ and outputting to saved frames (not streaming), and needing a text file input for bounding boxes on the initial frame.
Ideally for me I would need to use it in Python and OpenCV, using an ROI selector on the first frame, and take a camera image as input.