Pytorch resulting in segfault when calling convert

I am trying to run Pytorch on my Xavier and running into the following issue which creates a segfault.

I have installed the wheel from the distribution provided on this link.

Failed to load Python extension for LZ4 support. LZ4 compression will not be available.
Loading data from:
…/tests/data/just_velodyne_points_short.bag
Loading scan: 3
Initializing network
[ 41 1200 1200]
Fatal Python error: Segmentation fault

Current thread 0x0000007f823e2010 (most recent call first):
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 423 in convert
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 223 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 425 in to
File “/home/nvidia/Projects/lidar/src/map_processing/src/map_processing/second_utils.py”, line 159 in build_network
File “/home/nvidia/Projects/lidar/src/map_processing/src/map_processing/second_utils.py”, line 147 in init
File “xavier_test_script.py”, line 87 in
[1] 8864 segmentation fault (core dumped) python3 xavier_test_script.py -b …/tests/data/just_velodyne_points_short.bag

Here is the pip3 show torch torchvision

Name: torch 
Version: 1.4.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/nvidia/.local/lib/python3.6/site-packages
Requires: 
---
Name: torchvision
Version: 0.5.0a0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0-py3.6-linux-aarch64.egg
Requires: numpy, six, torch, pillow

Hi,

We will need more information to give a further suggestion.
Could you help us checking the memory status first?

1. Monitor the system status with tegrastats:

$ sudo tegrastats

2. Check memory usage via cuda-memcheck:

$ sudo /usr/local/cuda/bin/cuda-memcheck python3 [your_app.py]

Thanks.

Hi @AastaLLL,

Thanks a lot for your response.

While trying to run the program using cuda-memcheck with the following command.

/usr/local/cuda/bin/cuda-memcheck python3 my_app.py arg1 arg2 arg3 2>&1 | tee _myApp.log

Here is the log: _myApp.log (4.7 KB)

Also, I observed the sudo tegrastats 2>&1 | tee _tegraStats.log while running this application.

Here is the log: _tegraStats.log (22.0 KB)

Hi,

The log looks fine so we do more information for this issue.

We would like to reproduce this issue in our environment.
Would you mind to share a minimal reproducible source with us.

Thanks.

HI @AastaLLL, We were able to solve the issue by changing the order of how we were loading in the data.

It seems that we were loading the data from the host and then moving onto the device. After we moved to the device we were trying to access data from the host which resulted in a Segfault.

Thanks again for your help.

Cheers!

It is due to different version of PyTorch. install the latest version or any version above v1.5.0 it will fix the issue.
you can find the latest version in https://pytorch.org/ or jetson zoo Jetson Zoo - eLinux.org