Hey Team NVIDIA,
I have recently been using a jetson nano 2gb for model deployment. I have converted a custom model (pose estimation application) from Pytorch to onnx and .trt and am trying to run a live inference on the jetson nano.
I have not used other custom models with it, and doing so would likely require addition pre/post-processing code to support your model.
Alternatively, you may want to try torch2trt tool which you can integrate directly with your PyTorch scripts to accelerate your model with TensorRT without much changes.
Hey, thanks for getting back and also the great tutorials online. I have come across this repo and tried to use torch2trt with my model and the jetson is stuck for a long time. So I tried to use some sample code from the torch2trt website just as a starter and fails to convert and returns:
Segmentation fault (core dumped
import torch
from torch2trt import torch2trt
from torchvision.models.alexnet import alexnet
# create some regular pytorch model...
model = alexnet(pretrained=True).eval().cuda()
# create example data
x = torch.ones((1, 3, 224, 224)).cuda()
# convert to TensorRT feeding sample data as input
model_trt = torch2trt(model, [x])
torch.save(model_trt.state_dict(), 'alexnet_trt.pth')
I have managed to convert my model into .trt via onnx format but when i use opencv or jetson utils to run a live inference to the model it exits. How can I overcome this issue?
You would need to modify jetson-inference to use the pre/post-processing that your model expects. In my experience, there can be significant post-processing for pose estimations models. It may be easier for you to just use something like the ONNX Runtime, and use your existing Python application to do the pre/post-processing.
Do you mean the onnx runtime via cuda? I have a working onnx solution and tried it on a ras pi 4GB but it’s slow. Also tried running PyTorch directly via cuda on the jetson nano and it failed
ex: my model takes in a tensor(1, 3, 256, 192) and outputs a tensor(1, 18, 48, 48) and then takes the max values in the heatmap and thresholding. If i have to put this an onnx model through the detectnet example (or pose estimation model) how can I modify the jetson inference? I did not find much documentation online though on how to do?
Yes, if you set the ONNX backend to CUDA while running it on Jetson, it should be faster.
Your model is a pose estimation model, so it wouldn’t run through detectnet. Since your model is of a different architecture, you would need to modify the pre/post-processing here:
Hey dusty_nv,
I used a stacked hour glass this time and converted it into onnx cuda, tensor RT and torch2trt and was successfully able to do it by running a random input tensor (1,3, 256, 256) through the model for 50 epochs and it took 0.025 s per epoch. The problem arises when I call the inference with the camera (below code/OpenCV) and try to pass in the image the terminal stops showing a ram too low problem:
How can I call a live camera via jetcam on the terminal (references would help)? Thanks
import jetson.inference
import jetson.utils
net = stacked hourglass() #example
camera = jetson.utils.videoSource("csi://0") # '/dev/video0' for V4L2
display = jetson.utils.videoOutput("display://0") # 'my_video.mp4' for file
while display.IsStreaming():
img = camera.Capture()
#preprocessing done here ...
detections = net(img) # passing in the image to model
display.Render(img)