PyTorch "Segmentation fault (core dumped)" After Forward Propagation

MostafaTheReal · July 12, 2020, 3:21am

I have this model that I am running some sample batches from the MNIST fashion dataset

import torchvision
import torchvision.transforms as transforms
import torch
import matplotlib.pyplot as plt 
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

trainset = torchvision.datasets.FashionMNIST(root = "./data", train = True, download = True, transform = transforms.ToTensor())
testset = torchvision.datasets.FashionMNIST(root = "./data", train = False, download = True, transform = transforms.ToTensor())

trainloader = torch.utils.data.DataLoader(trainset, batch_size = 8, shuffle = True)
testloader = torch.utils.data.DataLoader(testset, batch_size= 8, shuffle = False)

device = torch.device("cuda:0")
print(device)

class vgg16(nn.Module):
    def __init__(self):
        super(vgg16, self).__init__()

        ## note that vgg always does same padding on convolutions
        ## dec img size by pooling and inc channels using kernels
        self.cnn_block = nn.Sequential(
            nn.Conv2d(1, 64, 3, padding = 1),
            nn.ReLU(),
            nn.Conv2d(64, 64, 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            
            nn.Conv2d(64, 256, 3, padding = 1),
            nn.ReLU(),
            nn.Conv2d(256, 256, 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),

            nn.Conv2d(256, 512, 3, padding = 1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding = 1),
            nn.ReLU(),
            nn.Conv2d(512, 512, 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(2, 1)
            # out = 6x6 img
        )

        self.fc_block = nn.Sequential(
            # 6x6x512 = 18432
            nn.Linear(18432, 4096),
            nn.ReLU(),
            nn.Linear(4096, 1024),
            nn.ReLU(),
            nn.Linear(1024, 256),
            nn.ReLU(),
            nn.Linear(256, 64),
            nn.ReLU(),
            nn.Linear(64, 10),
            # nn.Softmax(dim = 1) 
        )

    def forward(self, x):
        x = self.cnn_block(x)
        x = x.view(x.size(0), -1)
        x = self.fc_block(x)
        return x

I seem to be getting a Segmentation fault after completing one forward propagation. The program is able to do many forward propagation successfully but when it is done i get the segmentation fault. Also I do not have this problem when i set the device to “cpu”. This error only occurs when i set the device to “cuda:0” and it is the only error i get after successfully completing forward prop

Here is the snippet for the forward prop (doing it twice):

net = vgg16().to(device)
loss_func = nn.CrossEntropyLoss()
opt = optim.Adam(net.parameters(), lr = 0.0001)

dat = iter(testloader).__next__()[0].to(device)
print(dat.size())
out = net(dat)
print(out)

dat = iter(testloader).__next__()[0].to(device)
print(dat.size())
out = net(dat)
print(out)

Python 3.6
PyTorch 1.6
L4T 32.4.3
JetPack 4.4
CUDA: 10.2.89
cuDNN: 8.0.0.180

Can anybody point out whats wrong? Thanks so much!

MostafaTheReal · July 12, 2020, 7:33am

I found something that pretty much answers my post. Here it is:

Topic		Replies	Views
Pytorch resulting in segfault when calling convert Jetson AGX Xavier pytorch	6	2110	October 18, 2021
Import torch gives Segmentation fault on Jetson Orin Nano Jetson Nano jetson-inference , pytorch , python	4	1459	June 5, 2023
Segmentation fault in JetPack 5.1 container when using CUDA device in PyTorch Jetson Xavier NX cuda , docker , pytorch , python	8	892	March 30, 2023
Segmentation fault(core dumped) error while importing torch Jetson Nano pytorch	10	2053	January 26, 2023
Segmentation fault after retraining Jetson TX2 python	3	663	October 18, 2021
Jetpack 4.4: Segmentation fault issue and slow inference time Jetson Xavier NX cuda , yolo , cudnn	7	1092	October 18, 2021
Pytorch 1.7 nan results Jetson AGX Xavier pytorch	7	1940	October 18, 2021
Unexpected Segmentation Fault encountered in Worker Jetson Xavier NX ai-training	8	3109	May 17, 2023
How to install Pytorch 1.7 with cuDNN 10.2? Jetson Xavier NX pytorch	4	1378	October 18, 2021
Jetson Xavier NX has an error when installing torchvision Jetson Xavier NX pytorch	4	1354	October 18, 2021

PyTorch "Segmentation fault (core dumped)" After Forward Propagation

Related topics