Segmentation fault in JetPack 5.1 container when using CUDA device in PyTorch

Platform: Xavier NX with JetPack 5.1 (rev. 1) installed with SDK Manager.
I pulled latest (5.1) Docker image from here:

Inside container (which I run by official instruction in the link above, using --runtime=nvidia), I get Segmentation fault (core dumped) error thrown by this code:
Code:

import torch
import torch.nn as nn

print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
device = torch.device('cuda')
m = nn.Conv1d(7, 64, 1, bias=False, device=device)
input = torch.randn(100, 7, 32).to(device)
print('start')
output = m(input)
print('end')
print(output.shape)

Output:

PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
Segmentation fault (core dumped)

Nevertheless, if I set device = torch.device('cpu'), it works fine:
Output:

PyTorch version: 2.0.0a0+ec3941ad.nv23.02
CUDA available: True
CUDA version: 11.4
start
end
torch.Size([100, 64, 32])

Why CUDA throws segmentation fault? Can I fix this?

Hi,

We are going to reproduce this issue.
Will share more information with you later.

Thanks.

Additional info about my setup:

  • In SDK Manager I chose to install Jetson Linux and Jetson Runtime Components, but no Jetson SDK Components (as 16 GB of internal eMMC memory was not enough for it).
  • I configured Docker to store its images and containers on external SD Card (the same reason: 16 GB is too small). External SD Card (SanDisk Extreme 128 GB) attached to NX carrier board via USB port (I used USB-to-SD adapter).

Hi,

We have tested the l4t-pytorch:r35.2.1-pth2.0-py3 on XavierNX.
The sample can run correctly.

...
>>> print('start')
start
>>> output = m(input)
>>> print('end')
end
>>> print(output.shape)
torch.Size([100, 64, 32])

In case there are some issues when setting up the environment.
Could you reflash the system and try it again?

Thanks.

I flashed again with the same parameters and I still get Segmentation fault. I suspect it can be caused by docker root directory being on external drive. How your setup looks? Are your OS, docker and docker root dir on the same drive?

I moved both rootfs and docker root to sd card and tried again and I still get Segmentation fault while trying to use cuda inside pytorch inside container. I have no idea what else to try…

From your code snippet it is not clear if you used GPU or CPU. On CPU my code also works fine as I described in my question.

Hi,

We run it with GPU and our container is also saved on an external SSD.

device = torch.device('cuda')

Could you try if CUDA can work on your environment (inside the container)?
Please download the CUDA sample below and run the deviceQuery example.

Thanks.