Segmentation fault in JetPack 5.1 container when using CUDA device in PyTorch

Platform: Xavier NX with JetPack 5.1 (rev. 1) installed with SDK Manager.
I pulled latest (5.1) Docker image from here:

Inside container (which I run by official instruction in the link above, using --runtime=nvidia), I get Segmentation fault (core dumped) error thrown by this code:
Code:

import torch
import torch.nn as nn

print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
device = torch.device('cuda')
m = nn.Conv1d(7, 64, 1, bias=False, device=device)
input = torch.randn(100, 7, 32).to(device)
print('start')
output = m(input)
print('end')
print(output.shape)

Output:

PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
Segmentation fault (core dumped)

Nevertheless, if I set device = torch.device('cpu'), it works fine:
Output:

PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
end
torch.Size([100, 64, 32])

Why CUDA throws segmentation fault? Can I fix this?

Hi,

We are going to reproduce this issue.
Will share more information with you later.

Thanks.

Additional info about my setup:

  • In SDK Manager I chose to install Jetson Linux and Jetson Runtime Components, but no Jetson SDK Components (as 16 GB of internal eMMC memory was not enough for it).
  • I configured Docker to store its images and containers on external SD Card (the same reason: 16 GB is too small). External SD Card (SanDisk Extreme 128 GB) attached to NX carrier board via USB port (I used USB-to-SD adapter).

Hi,

We have tested the l4t-pytorch:r35.2.1-pth2.0-py3 on XavierNX.
The sample can run correctly.

...
>>> print('start')
start
>>> output = m(input)
>>> print('end')
end
>>> print(output.shape)
torch.Size([100, 64, 32])

In case there are some issues when setting up the environment.
Could you reflash the system and try it again?

Thanks.

I flashed again with the same parameters and I still get Segmentation fault. I suspect it can be caused by docker root directory being on external drive. How your setup looks? Are your OS, docker and docker root dir on the same drive?

I moved both rootfs and docker root to sd card and tried again and I still get Segmentation fault while trying to use cuda inside pytorch inside container. I have no idea what else to try…

From your code snippet it is not clear if you used GPU or CPU. On CPU my code also works fine as I described in my question.

Hi,

We run it with GPU and our container is also saved on an external SSD.

device = torch.device('cuda')

Could you try if CUDA can work on your environment (inside the container)?
Please download the CUDA sample below and run the deviceQuery example.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.