SiamMask on Jetson Xavier NX, pytorch, slow FPS

I am running SiamMask (https://github.com/foolwood/SiamMask) on a Jetson Xavier NX, using pytorch.

I currently am getting a frame rate of about 5 FPS, I was hoping for much faster.

I know the pytorch would be slower than tensorRT but I was still expecting a much faster rate. It seems like it may only be using the CPU, not GPU.

Could anyone help me with this or give me some insight?

I installed pytorch through the nvidia too in jetson-inference/build. I am showing version 1.4.0.

Thanks!

Hi,

You can monitor the system status with tegrastats at the same time:

$ sudo tegrastats

If the GR3D utilization doesn’t reach 99%, it indicates that your application still has zoom for acceleration.

RAM 2275/7764MB (lfb 729x4MB) … GR3D_FREQ 0%@1109

Another thing worthy to check is that system clock and device performance.
You can try to apply the following configure to turn on all the CPU and maximize the clock rate.

$ sudo nvpmodel -m 2
$ sudo jetson_clocks

Thanks.

GR3D_FREQ ranges from 77 to 99%.

CPU and clock rates maximized.

Hi,

It looks like the GPU utilization is quite high.

A better way is to convert your model into TensorRT.
TensorRT has optimized for Jetson platform so it will be a better choice for Jetson inference.

How about to convert your model into onnx format and give a quite try:

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]

This will give you a better idea of the XavierNX computational capacity.

Thanks.

I am trying that, and will update this thread with results.

I am not totally sure how I will then employ the model with the SiamMask code, I could use some guidance on that.

I tried converting to ONNX and got the following error:

$ /usr/src/tensorrt/bin/trtexec --onnx=model.pth
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
[07/01/2020-12:43:58] [I] === Model Options ===
[07/01/2020-12:43:58] [I] Format: ONNX
[07/01/2020-12:43:58] [I] Model: model.pth
[07/01/2020-12:43:58] [I] Output:
[07/01/2020-12:43:58] [I] === Build Options ===
[07/01/2020-12:43:58] [I] Max batch: 1
[07/01/2020-12:43:58] [I] Workspace: 16 MB
[07/01/2020-12:43:58] [I] minTiming: 1
[07/01/2020-12:43:58] [I] avgTiming: 8
[07/01/2020-12:43:58] [I] Precision: FP32
[07/01/2020-12:43:58] [I] Calibration: 
[07/01/2020-12:43:58] [I] Safe mode: Disabled
[07/01/2020-12:43:58] [I] Save engine: 
[07/01/2020-12:43:58] [I] Load engine: 
[07/01/2020-12:43:58] [I] Builder Cache: Enabled
[07/01/2020-12:43:58] [I] NVTX verbosity: 0
[07/01/2020-12:43:58] [I] Inputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Outputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Input build shapes: model
[07/01/2020-12:43:58] [I] Input calibration shapes: model
[07/01/2020-12:43:58] [I] === System Options ===
[07/01/2020-12:43:58] [I] Device: 0
[07/01/2020-12:43:58] [I] DLACore: 
[07/01/2020-12:43:58] [I] Plugins:
[07/01/2020-12:43:58] [I] === Inference Options ===
[07/01/2020-12:43:58] [I] Batch: 1
[07/01/2020-12:43:58] [I] Input inference shapes: model
[07/01/2020-12:43:58] [I] Iterations: 10
[07/01/2020-12:43:58] [I] Duration: 3s (+ 200ms warm up)
[07/01/2020-12:43:58] [I] Sleep time: 0ms
[07/01/2020-12:43:58] [I] Streams: 1
[07/01/2020-12:43:58] [I] ExposeDMA: Disabled
[07/01/2020-12:43:58] [I] Spin-wait: Disabled
[07/01/2020-12:43:58] [I] Multithreading: Disabled
[07/01/2020-12:43:58] [I] CUDA Graph: Disabled
[07/01/2020-12:43:58] [I] Skip inference: Disabled
[07/01/2020-12:43:58] [I] Inputs:
[07/01/2020-12:43:58] [I] === Reporting Options ===
[07/01/2020-12:43:58] [I] Verbose: Disabled
[07/01/2020-12:43:58] [I] Averages: 10 inferences
[07/01/2020-12:43:58] [I] Percentile: 99
[07/01/2020-12:43:58] [I] Dump output: Disabled
[07/01/2020-12:43:58] [I] Profile: Disabled
[07/01/2020-12:43:58] [I] Export timing to JSON file: 
[07/01/2020-12:43:58] [I] Export output to JSON file: 
[07/01/2020-12:43:58] [I] Export profile to JSON file: 
[07/01/2020-12:43:58] [I] 
----------------------------------------------------------------
Input filename:   model.pth
ONNX IR version:  0.0.0
Opset version:    0
Producer name:    
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[07/01/2020-12:44:00] [E] [TRT] Network must have at least one output
[07/01/2020-12:44:00] [E] [TRT] Network validation failed.
[07/01/2020-12:44:00] [E] Engine creation failed
[07/01/2020-12:44:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth

I am trying this code (from: https://github.com/STVIR/pysot/issues/125 ) to convert a siamRPN (not siamMASK model)
It looks like an ONNX model was created, but I’m not so sure it really worked. Converting to TRT fails, as shown after this code.

import torch
import torchvision.models as models
import torch.nn as nn
import argparse
import torchvision
import torch.onnx

from collections import OrderedDict
from pysot.core.config import cfg
from pysot.models.model_builder import ModelBuilder

parser = argparse.ArgumentParser(description='trans demo')
parser.add_argument('--config', type=str, help='config file')
parser.add_argument('--snapshot', type=str, help='model name')
args = parser.parse_args()

cfg.merge_from_file(args.config)
cfg.CUDA = torch.cuda.is_available() and cfg.CUDA
device = torch.device('cuda' if cfg.CUDA else 'cpu')

class ConvertModel(nn.Module):
    def __init__(self, model0):
        super(ConvertModel, self).__init__()
        self.model = model0

    def forward(self, template, search):
        zf = self.model.backbone(template)
        xf = self.model.backbone(search)
        cls, loc = self.model.rpn_head(zf, xf)
        return cls, loc


model0 = ModelBuilder()
model0.load_state_dict(torch.load(args.snapshot, map_location=lambda storage, loc: storage.cpu()))

model0.eval()
print(model0)
model = ConvertModel(model0)

x = torch.randn(1, 3, 127, 127)
z = torch.randn(1, 3, 287, 287)

torch_out = torch.onnx._export(model, (x,z), "model.onnx", export_params=True)

The file “model.onnx” was created.

    /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
    &&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
    [07/11/2020-22:13:09] [I] === Model Options ===
    [07/11/2020-22:13:09] [I] Format: ONNX
    [07/11/2020-22:13:09] [I] Model: model.onnx
    [07/11/2020-22:13:09] [I] Output:
    [07/11/2020-22:13:09] [I] === Build Options ===
    [07/11/2020-22:13:09] [I] Max batch: 1
    [07/11/2020-22:13:09] [I] Workspace: 16 MB
    [07/11/2020-22:13:09] [I] minTiming: 1
    [07/11/2020-22:13:09] [I] avgTiming: 8
    [07/11/2020-22:13:09] [I] Precision: FP32
    [07/11/2020-22:13:09] [I] Calibration: 
    [07/11/2020-22:13:09] [I] Safe mode: Disabled
    [07/11/2020-22:13:09] [I] Save engine: 
    [07/11/2020-22:13:09] [I] Load engine: 
    [07/11/2020-22:13:09] [I] Builder Cache: Enabled
    [07/11/2020-22:13:09] [I] NVTX verbosity: 0
    [07/11/2020-22:13:09] [I] Inputs format: fp32:CHW
    [07/11/2020-22:13:09] [I] Outputs format: fp32:CHW
    [07/11/2020-22:13:09] [I] Input build shapes: model
    [07/11/2020-22:13:09] [I] Input calibration shapes: model
    [07/11/2020-22:13:09] [I] === System Options ===
    [07/11/2020-22:13:09] [I] Device: 0
    [07/11/2020-22:13:09] [I] DLACore: 
    [07/11/2020-22:13:09] [I] Plugins:
    [07/11/2020-22:13:09] [I] === Inference Options ===
    [07/11/2020-22:13:09] [I] Batch: 1
    [07/11/2020-22:13:09] [I] Input inference shapes: model
    [07/11/2020-22:13:09] [I] Iterations: 10
    [07/11/2020-22:13:09] [I] Duration: 3s (+ 200ms warm up)
    [07/11/2020-22:13:09] [I] Sleep time: 0ms
    [07/11/2020-22:13:09] [I] Streams: 1
    [07/11/2020-22:13:09] [I] ExposeDMA: Disabled
    [07/11/2020-22:13:09] [I] Spin-wait: Disabled
    [07/11/2020-22:13:09] [I] Multithreading: Disabled
    [07/11/2020-22:13:09] [I] CUDA Graph: Disabled
    [07/11/2020-22:13:09] [I] Skip inference: Disabled
    [07/11/2020-22:13:09] [I] Inputs:
    [07/11/2020-22:13:09] [I] === Reporting Options ===
    [07/11/2020-22:13:09] [I] Verbose: Disabled
    [07/11/2020-22:13:09] [I] Averages: 10 inferences
    [07/11/2020-22:13:09] [I] Percentile: 99
    [07/11/2020-22:13:09] [I] Dump output: Disabled
    [07/11/2020-22:13:09] [I] Profile: Disabled
    [07/11/2020-22:13:09] [I] Export timing to JSON file: 
    [07/11/2020-22:13:09] [I] Export output to JSON file: 
    [07/11/2020-22:13:09] [I] Export profile to JSON file: 
    [07/11/2020-22:13:09] [I] 
    ----------------------------------------------------------------
    Input filename:   model.onnx
    ONNX IR version:  0.0.4
    Opset version:    9
    Producer name:    pytorch
    Producer version: 1.3
    Domain:           
    Model version:    0
    Doc string:       
    ----------------------------------------------------------------
    [07/11/2020-22:13:11] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
    ERROR: onnx2trt_utils.cpp:1523 In function convMultiInput:
    [8] Assertion failed: filter_dim.d[nbSpatialDims - i] == kernel_tensor_ptr->getDimensions().d[kernel_tensor_ptr->getDimensions().nbDims - i]
    [07/11/2020-22:13:11] [E] Failed to parse onnx file
    [07/11/2020-22:13:11] [E] Parsing model failed
    [07/11/2020-22:13:11] [E] Engine creation failed
    [07/11/2020-22:13:11] [E] Engine set up failed
    &&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx

Hi,

Please noticed that you will need to generate the onnx format to feed the model into TensorRT.
You can check this page to convert the .pth file into .onnx first:
https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html

Thanks.

Hi @AastaLLL please read through the post above, I did create the onnx model.

Hi,

Sorry for that.
The assertion occurs from here:

This indicates that the siamRPN includes some non-supported operations.
More precisely, it looks like that the spatial dimension and the kernel dimension cannot match in certain operation.

Here is the detail support matrix for your reference:

Thanks.

OK - so you think no way to convert this model to TensorRT currently?

Hi,

We don’t have too much experience on SiamMask.
It’s recommended to check if there is any network architecture that cannot be supported by TensorRT or onnx2trt first.

Thanks.

Do you have any recommendations for Single Object object trackers (preferably user-selectable ROI) that are optimized for Jetson?
I have experimented with some of the trackers on deepstream, but they are all object-detection based and multi-object tracking, so this leads to a number of issues, mostly ID switching related.

Hi,

You can check if the “KLT Bounding Box Tracker” in our VPI SDK can meet your requirement.
https://docs.nvidia.com/vpi/algo_klt_tracker.html
It use the same tracking algorithm but takes the bounding box as input so can be customized.

Thanks.

Thank you! Is there any python implementation of this for Jetson?

Hi,

Sorry that our VPI library only have C++ API currently.
Thanks.

I see you need to install the VPI through the SDK manager, as installing VPI through apt is only possible for linux-x86_64 systems (not aarch64).
Unfortunately I do not have a linux machine to run the SDK manager on, and you cannot run the SDK manager on the jetson device itself.

Hi,

VPI is installed when you flash NX with sdkmanager.
You can find the sample on device here:

/opt/nvidia/vpi-0.3/samples

Thanks.

Found it, thank you!

The KLT tracker sample looks and runs great - but it is difficult for me to access only being in C++ and outputting to saved frames (not streaming), and needing a text file input for bounding boxes on the initial frame.
Ideally for me I would need to use it in Python and OpenCV, using an ROI selector on the first frame, and take a camera image as input.