Pytorch resulting in segfault when calling convert

kjoshi · April 16, 2020, 1:16am

I am trying to run Pytorch on my Xavier and running into the following issue which creates a segfault.

I have installed the wheel from the distribution provided on this link.

Failed to load Python extension for LZ4 support. LZ4 compression will not be available.
Loading data from:
…/tests/data/just_velodyne_points_short.bag
Loading scan: 3
Initializing network
[ 41 1200 1200]
Fatal Python error: Segmentation fault

Current thread 0x0000007f823e2010 (most recent call first):
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 423 in convert
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 223 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 201 in _apply
File “/home/nvidia/.local/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 425 in to
File “/home/nvidia/Projects/lidar/src/map_processing/src/map_processing/second_utils.py”, line 159 in build_network
File “/home/nvidia/Projects/lidar/src/map_processing/src/map_processing/second_utils.py”, line 147 in init
File “xavier_test_script.py”, line 87 in
[1] 8864 segmentation fault (core dumped) python3 xavier_test_script.py -b …/tests/data/just_velodyne_points_short.bag

Here is the pip3 show torch torchvision

Name: torch 
Version: 1.4.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/nvidia/.local/lib/python3.6/site-packages
Requires: 
---
Name: torchvision
Version: 0.5.0a0
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: soumith@pytorch.org
License: BSD
Location: /usr/local/lib/python3.6/dist-packages/torchvision-0.5.0a0-py3.6-linux-aarch64.egg
Requires: numpy, six, torch, pillow

AastaLLL · April 16, 2020, 3:27am

Hi,

We will need more information to give a further suggestion.
Could you help us checking the memory status first?

1. Monitor the system status with tegrastats:

$ sudo tegrastats

2. Check memory usage via cuda-memcheck:

$ sudo /usr/local/cuda/bin/cuda-memcheck python3 [your_app.py]

Thanks.

kjoshi · April 16, 2020, 11:17am

Hi @AastaLLL,

Thanks a lot for your response.

While trying to run the program using cuda-memcheck with the following command.

/usr/local/cuda/bin/cuda-memcheck python3 my_app.py arg1 arg2 arg3 2>&1 | tee _myApp.log

Here is the log: _myApp.log (4.7 KB)

Also, I observed the sudo tegrastats 2>&1 | tee _tegraStats.log while running this application.

Here is the log: _tegraStats.log (22.0 KB)

AastaLLL · April 17, 2020, 2:29am

Hi,

The log looks fine so we do more information for this issue.

We would like to reproduce this issue in our environment.
Would you mind to share a minimal reproducible source with us.

Thanks.

kjoshi · April 23, 2020, 7:13pm

HI @AastaLLL, We were able to solve the issue by changing the order of how we were loading in the data.

It seems that we were loading the data from the host and then moving onto the device. After we moved to the device we were trying to access data from the host which resulted in a Segfault.

Thanks again for your help.

Cheers!

Aravind_Seenu · November 19, 2020, 7:27pm

It is due to different version of PyTorch. install the latest version or any version above v1.5.0 it will fix the issue.
you can find the latest version in https://pytorch.org/ or jetson zoo Jetson Zoo - eLinux.org

Topic		Replies	Views
Segmentation fault (core dumped) in jetson xavier! Frameworks pytorch	3	1014	December 16, 2020
Jetson Xavier NX has an error when installing torchvision Jetson Xavier NX pytorch	4	1354	October 18, 2021
Import torch gives Segmentation fault on Jetson Orin Nano Jetson Nano jetson-inference , pytorch , python	4	1485	June 5, 2023
PyTorch "Segmentation fault (core dumped)" After Forward Propagation Jetson Xavier NX pytorch	2	3517	October 18, 2021
Jetson Nano PyTorch Vision v0.7.0-rc2 Installation Error Jetson Nano pytorch	4	730	October 18, 2021
Xavier agx torch Jetson AGX Xavier pytorch	12	640	April 10, 2024
Segmentation fault in JetPack 5.1 container when using CUDA device in PyTorch Jetson Xavier NX cuda , docker , pytorch , python	8	901	March 30, 2023
Segmentation fault when importing pycuda drivers Jetson AGX Xavier cuda , python , pycuda	6	953	August 1, 2023
Problem with PyTorch on AGX Xavier (no libcurand.so.10) Jetson AGX Xavier cuda , pytorch	3	577	November 2, 2022
Segmentation Fault with tutorial: 'Coding Your Own Image Recognition Program (Python)' Jetson AGX Xavier python	7	382	June 22, 2023

Pytorch resulting in segfault when calling convert

Related topics