gbenel
June 22, 2020, 9:58pm
1
I am running SiamMask (GitHub - foolwood/SiamMask: [CVPR2019] Fast Online Object Tracking and Segmentation: A Unifying Approach ) on a Jetson Xavier NX, using pytorch.
I currently am getting a frame rate of about 5 FPS, I was hoping for much faster.
I know the pytorch would be slower than tensorRT but I was still expecting a much faster rate. It seems like it may only be using the CPU, not GPU.
Could anyone help me with this or give me some insight?
I installed pytorch through the nvidia too in jetson-inference/build. I am showing version 1.4.0.
Thanks!
Hi,
You can monitor the system status with tegrastats at the same time:
$ sudo tegrastats
If the GR3D utilization doesn’t reach 99%, it indicates that your application still has zoom for acceleration.
RAM 2275/7764MB (lfb 729x4MB) … GR3D_FREQ 0%@1109 …
Another thing worthy to check is that system clock and device performance.
You can try to apply the following configure to turn on all the CPU and maximize the clock rate.
$ sudo nvpmodel -m 2
$ sudo jetson_clocks
Thanks.
gbenel
June 23, 2020, 2:29pm
4
GR3D_FREQ ranges from 77 to 99%.
CPU and clock rates maximized.
Hi,
It looks like the GPU utilization is quite high.
A better way is to convert your model into TensorRT.
TensorRT has optimized for Jetson platform so it will be a better choice for Jetson inference.
How about to convert your model into onnx format and give a quite try:
$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]
This will give you a better idea of the XavierNX computational capacity.
Thanks.
gbenel
June 24, 2020, 4:21pm
6
I am trying that, and will update this thread with results.
I am not totally sure how I will then employ the model with the SiamMask code, I could use some guidance on that.
I tried converting to ONNX and got the following error:
$ /usr/src/tensorrt/bin/trtexec --onnx=model.pth
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
[07/01/2020-12:43:58] [I] === Model Options ===
[07/01/2020-12:43:58] [I] Format: ONNX
[07/01/2020-12:43:58] [I] Model: model.pth
[07/01/2020-12:43:58] [I] Output:
[07/01/2020-12:43:58] [I] === Build Options ===
[07/01/2020-12:43:58] [I] Max batch: 1
[07/01/2020-12:43:58] [I] Workspace: 16 MB
[07/01/2020-12:43:58] [I] minTiming: 1
[07/01/2020-12:43:58] [I] avgTiming: 8
[07/01/2020-12:43:58] [I] Precision: FP32
[07/01/2020-12:43:58] [I] Calibration:
[07/01/2020-12:43:58] [I] Safe mode: Disabled
[07/01/2020-12:43:58] [I] Save engine:
[07/01/2020-12:43:58] [I] Load engine:
[07/01/2020-12:43:58] [I] Builder Cache: Enabled
[07/01/2020-12:43:58] [I] NVTX verbosity: 0
[07/01/2020-12:43:58] [I] Inputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Outputs format: fp32:CHW
[07/01/2020-12:43:58] [I] Input build shapes: model
[07/01/2020-12:43:58] [I] Input calibration shapes: model
[07/01/2020-12:43:58] [I] === System Options ===
[07/01/2020-12:43:58] [I] Device: 0
[07/01/2020-12:43:58] [I] DLACore:
[07/01/2020-12:43:58] [I] Plugins:
[07/01/2020-12:43:58] [I] === Inference Options ===
[07/01/2020-12:43:58] [I] Batch: 1
[07/01/2020-12:43:58] [I] Input inference shapes: model
[07/01/2020-12:43:58] [I] Iterations: 10
[07/01/2020-12:43:58] [I] Duration: 3s (+ 200ms warm up)
[07/01/2020-12:43:58] [I] Sleep time: 0ms
[07/01/2020-12:43:58] [I] Streams: 1
[07/01/2020-12:43:58] [I] ExposeDMA: Disabled
[07/01/2020-12:43:58] [I] Spin-wait: Disabled
[07/01/2020-12:43:58] [I] Multithreading: Disabled
[07/01/2020-12:43:58] [I] CUDA Graph: Disabled
[07/01/2020-12:43:58] [I] Skip inference: Disabled
[07/01/2020-12:43:58] [I] Inputs:
[07/01/2020-12:43:58] [I] === Reporting Options ===
[07/01/2020-12:43:58] [I] Verbose: Disabled
[07/01/2020-12:43:58] [I] Averages: 10 inferences
[07/01/2020-12:43:58] [I] Percentile: 99
[07/01/2020-12:43:58] [I] Dump output: Disabled
[07/01/2020-12:43:58] [I] Profile: Disabled
[07/01/2020-12:43:58] [I] Export timing to JSON file:
[07/01/2020-12:43:58] [I] Export output to JSON file:
[07/01/2020-12:43:58] [I] Export profile to JSON file:
[07/01/2020-12:43:58] [I]
----------------------------------------------------------------
Input filename: model.pth
ONNX IR version: 0.0.0
Opset version: 0
Producer name:
Producer version:
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[07/01/2020-12:44:00] [E] [TRT] Network must have at least one output
[07/01/2020-12:44:00] [E] [TRT] Network validation failed.
[07/01/2020-12:44:00] [E] Engine creation failed
[07/01/2020-12:44:00] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.pth
gbenel
July 12, 2020, 3:28am
8
I am trying this code (from: pysot model.pth to onnx occurred error · Issue #125 · STVIR/pysot · GitHub ) to convert a siamRPN (not siamMASK model)
It looks like an ONNX model was created, but I’m not so sure it really worked. Converting to TRT fails, as shown after this code.
import torch
import torchvision.models as models
import torch.nn as nn
import argparse
import torchvision
import torch.onnx
from collections import OrderedDict
from pysot.core.config import cfg
from pysot.models.model_builder import ModelBuilder
parser = argparse.ArgumentParser(description='trans demo')
parser.add_argument('--config', type=str, help='config file')
parser.add_argument('--snapshot', type=str, help='model name')
args = parser.parse_args()
cfg.merge_from_file(args.config)
cfg.CUDA = torch.cuda.is_available() and cfg.CUDA
device = torch.device('cuda' if cfg.CUDA else 'cpu')
class ConvertModel(nn.Module):
def __init__(self, model0):
super(ConvertModel, self).__init__()
self.model = model0
def forward(self, template, search):
zf = self.model.backbone(template)
xf = self.model.backbone(search)
cls, loc = self.model.rpn_head(zf, xf)
return cls, loc
model0 = ModelBuilder()
model0.load_state_dict(torch.load(args.snapshot, map_location=lambda storage, loc: storage.cpu()))
model0.eval()
print(model0)
model = ConvertModel(model0)
x = torch.randn(1, 3, 127, 127)
z = torch.randn(1, 3, 287, 287)
torch_out = torch.onnx._export(model, (x,z), "model.onnx", export_params=True)
The file “model.onnx” was created.
/usr/src/tensorrt/bin/trtexec --onnx=model.onnx
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
[07/11/2020-22:13:09] [I] === Model Options ===
[07/11/2020-22:13:09] [I] Format: ONNX
[07/11/2020-22:13:09] [I] Model: model.onnx
[07/11/2020-22:13:09] [I] Output:
[07/11/2020-22:13:09] [I] === Build Options ===
[07/11/2020-22:13:09] [I] Max batch: 1
[07/11/2020-22:13:09] [I] Workspace: 16 MB
[07/11/2020-22:13:09] [I] minTiming: 1
[07/11/2020-22:13:09] [I] avgTiming: 8
[07/11/2020-22:13:09] [I] Precision: FP32
[07/11/2020-22:13:09] [I] Calibration:
[07/11/2020-22:13:09] [I] Safe mode: Disabled
[07/11/2020-22:13:09] [I] Save engine:
[07/11/2020-22:13:09] [I] Load engine:
[07/11/2020-22:13:09] [I] Builder Cache: Enabled
[07/11/2020-22:13:09] [I] NVTX verbosity: 0
[07/11/2020-22:13:09] [I] Inputs format: fp32:CHW
[07/11/2020-22:13:09] [I] Outputs format: fp32:CHW
[07/11/2020-22:13:09] [I] Input build shapes: model
[07/11/2020-22:13:09] [I] Input calibration shapes: model
[07/11/2020-22:13:09] [I] === System Options ===
[07/11/2020-22:13:09] [I] Device: 0
[07/11/2020-22:13:09] [I] DLACore:
[07/11/2020-22:13:09] [I] Plugins:
[07/11/2020-22:13:09] [I] === Inference Options ===
[07/11/2020-22:13:09] [I] Batch: 1
[07/11/2020-22:13:09] [I] Input inference shapes: model
[07/11/2020-22:13:09] [I] Iterations: 10
[07/11/2020-22:13:09] [I] Duration: 3s (+ 200ms warm up)
[07/11/2020-22:13:09] [I] Sleep time: 0ms
[07/11/2020-22:13:09] [I] Streams: 1
[07/11/2020-22:13:09] [I] ExposeDMA: Disabled
[07/11/2020-22:13:09] [I] Spin-wait: Disabled
[07/11/2020-22:13:09] [I] Multithreading: Disabled
[07/11/2020-22:13:09] [I] CUDA Graph: Disabled
[07/11/2020-22:13:09] [I] Skip inference: Disabled
[07/11/2020-22:13:09] [I] Inputs:
[07/11/2020-22:13:09] [I] === Reporting Options ===
[07/11/2020-22:13:09] [I] Verbose: Disabled
[07/11/2020-22:13:09] [I] Averages: 10 inferences
[07/11/2020-22:13:09] [I] Percentile: 99
[07/11/2020-22:13:09] [I] Dump output: Disabled
[07/11/2020-22:13:09] [I] Profile: Disabled
[07/11/2020-22:13:09] [I] Export timing to JSON file:
[07/11/2020-22:13:09] [I] Export output to JSON file:
[07/11/2020-22:13:09] [I] Export profile to JSON file:
[07/11/2020-22:13:09] [I]
----------------------------------------------------------------
Input filename: model.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
[07/11/2020-22:13:11] [W] [TRT] onnx2trt_utils.cpp:217: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: onnx2trt_utils.cpp:1523 In function convMultiInput:
[8] Assertion failed: filter_dim.d[nbSpatialDims - i] == kernel_tensor_ptr->getDimensions().d[kernel_tensor_ptr->getDimensions().nbDims - i]
[07/11/2020-22:13:11] [E] Failed to parse onnx file
[07/11/2020-22:13:11] [E] Parsing model failed
[07/11/2020-22:13:11] [E] Engine creation failed
[07/11/2020-22:13:11] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model.onnx
Hi,
Please noticed that you will need to generate the onnx format to feed the model into TensorRT.
You can check this page to convert the .pth file into .onnx first:
https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
Thanks.
gbenel
July 15, 2020, 10:50am
10
Hi @AastaLLL please read through the post above, I did create the onnx model.
Hi,
Sorry for that.
The assertion occurs from here:
{
return &tensor;
}
nvinfer1::IShuffleLayer* layer = ctx->network()->addShuffle(tensor);
if (!layer)
{
return nullptr;
}
layer->setReshapeDimensions(shape);
layer->setZeroIsPlaceholder(false);
return layer->getOutput(0);
}
NodeImportResult scaleHelper(IImporterContext* ctx, const ::ONNX_NAMESPACE::NodeProto& node, nvinfer1::ITensor& tensor_,
nvinfer1::ScaleMode mode, const nvinfer1::Weights& shift, const nvinfer1::Weights& scale,
const nvinfer1::Weights& power, const char* shiftName, const char* scaleName)
{
nvinfer1::ITensor* tensorPtr = &tensor_;
const ShapeTensor origShape = shapeOf(*tensorPtr);
// TensorRT scale layers support 4D(NCHW) or 5D(NCDHW) input.
This indicates that the siamRPN includes some non-supported operations.
More precisely, it looks like that the spatial dimension and the kernel dimension cannot match in certain operation.
Here is the detail support matrix for your reference:
https://github.com/onnx/onnx-tensorrt/blob/master/operators.md
Thanks.
gbenel
July 16, 2020, 1:14pm
12
OK - so you think no way to convert this model to TensorRT currently?
Hi,
We don’t have too much experience on SiamMask.
It’s recommended to check if there is any network architecture that cannot be supported by TensorRT or onnx2trt first.
Thanks.
gbenel
July 18, 2020, 2:00pm
14
Do you have any recommendations for Single Object object trackers (preferably user-selectable ROI) that are optimized for Jetson?
I have experimented with some of the trackers on deepstream, but they are all object-detection based and multi-object tracking, so this leads to a number of issues, mostly ID switching related.
Hi,
You can check if the “KLT Bounding Box Tracker” in our VPI SDK can meet your requirement.
https://docs.nvidia.com/vpi/algo_klt_tracker.html
It use the same tracking algorithm but takes the bounding box as input so can be customized.
Thanks.
gbenel
July 29, 2020, 11:47am
16
Thank you! Is there any python implementation of this for Jetson?
Hi,
Sorry that our VPI library only have C++ API currently.
Thanks.
gbenel
July 30, 2020, 4:58pm
18
I see you need to install the VPI through the SDK manager, as installing VPI through apt is only possible for linux-x86_64 systems (not aarch64).
Unfortunately I do not have a linux machine to run the SDK manager on, and you cannot run the SDK manager on the jetson device itself.
Hi,
VPI is installed when you flash NX with sdkmanager.
You can find the sample on device here:
/opt/nvidia/vpi-0.3/samples
Thanks.
gbenel
July 31, 2020, 3:24pm
21
The KLT tracker sample looks and runs great - but it is difficult for me to access only being in C++ and outputting to saved frames (not streaming), and needing a text file input for bounding boxes on the initial frame.
Ideally for me I would need to use it in Python and OpenCV, using an ROI selector on the first frame, and take a camera image as input.