Platform: Xavier NX with JetPack 5.1 (rev. 1) installed with SDK Manager.
I pulled latest (5.1) Docker image from here:
Inside container (which I run by official instruction in the link above, using --runtime=nvidia), I get Segmentation fault (core dumped) error thrown by this code:
Code:
import torch
import torch.nn as nn
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
device = torch.device('cuda')
m = nn.Conv1d(7, 64, 1, bias=False, device=device)
input = torch.randn(100, 7, 32).to(device)
print('start')
output = m(input)
print('end')
print(output.shape)
Output:
PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
Segmentation fault (core dumped)
Nevertheless, if I set device = torch.device('cpu'), it works fine:
Output:
PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
end
torch.Size([100, 64, 32])
Why CUDA throws segmentation fault? Can I fix this?
Hi,
We are going to reproduce this issue.
Will share more information with you later.
Thanks.
Additional info about my setup:
- In SDK Manager I chose to install Jetson Linux and Jetson Runtime Components, but no Jetson SDK Components (as 16 GB of internal eMMC memory was not enough for it).
- I configured Docker to store its images and containers on external SD Card (the same reason: 16 GB is too small). External SD Card (SanDisk Extreme 128 GB) attached to NX carrier board via USB port (I used USB-to-SD adapter).
Hi,
We have tested the l4t-pytorch:r35.2.1-pth2.0-py3 on XavierNX.
The sample can run correctly.
...
>>> print('start')
start
>>> output = m(input)
>>> print('end')
end
>>> print(output.shape)
torch.Size([100, 64, 32])
In case there are some issues when setting up the environment.
Could you reflash the system and try it again?
Thanks.
I flashed again with the same parameters and I still get Segmentation fault. I suspect it can be caused by docker root directory being on external drive. How your setup looks? Are your OS, docker and docker root dir on the same drive?
I moved both rootfs and docker root to sd card and tried again and I still get Segmentation fault while trying to use cuda inside pytorch inside container. I have no idea what else to try…
From your code snippet it is not clear if you used GPU or CPU. On CPU my code also works fine as I described in my question.
Hi,
We run it with GPU and our container is also saved on an external SSD.
device = torch.device('cuda')
Could you try if CUDA can work on your environment (inside the container)?
Please download the CUDA sample below and run the deviceQuery example.
Thanks.