Why this segmentation fault occur

tajiknomi · September 1, 2018, 9:57am

I am new to CUDA. I have wrote a simple program to add two vectors of specified length. The program works fine until i exceed a specific limit of elements to be added.
When the NumOfElements is 83888 it works fine,but when i increase this value ten times i.e. 838880, segmentation fault occurs.
i have enough memory to hold these values as i am running it on GFORCE 930MX with ~2GB on global memory and ~8GB of host memory. These 3 vectors of specified length will take no longer then 838880 elements x 3 Arrays x 4 bytes each = 8MB

I debug the code and found that when the IP is at printf(“here0\n”);, segmentation fault occurs.
Here is the sample code which i am running.

/*

#include <cuda_runtime_api.h>
#include <cuda.h>
#include <stdio.h>
#include <numeric>
#include <stdlib.h>
#include <stdint.h>

#define NumOfElements <b>83888</b>			// 838880 (segmentation fault)
#define NumOfThreadsPerBlock 128

__global__ void add(int32_t *a,int32_t *b,int32_t *res_dev){
	int32_t tid = threadIdx.x + (blockIdx.x * blockDim.x);
	while(tid < NumOfElements){
		res_dev[tid] = a[tid] + b[tid];
		tid += blockDim.x * gridDim.x;
	}
}

int main(void){
	printf("here0\n");
	int32_t a[NumOfElements],b[NumOfElements],res_host[NumOfElements];
	int32_t i;

	for(i=0;i<NumOfElements-1;i++){
		a[i] = i;
		b[i] = i+1;
	}
	int32_t NumOfBlocks = (NumOfElements+(NumOfThreadsPerBlock-1))/NumOfThreadsPerBlock;

	int32_t *a_dev,*b_dev,*res_dev;

// Allocate memory on device
	cudaMalloc((void**)&a_dev,sizeof(int32_t)*NumOfElements);
	cudaMalloc((void**)&b_dev,sizeof(int32_t)*NumOfElements);
	cudaMalloc((void**)&res_dev,sizeof(int32_t)*NumOfElements);
// Copy vectors from host to device
	cudaMemcpy((void*)a_dev,(void*)&a,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
	cudaMemcpy((void*)b_dev,(void*)&b,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
// Launch kernel
	printf("here1\n");
	add<<<NumOfBlocks,NumOfThreadsPerBlock>>>(a_dev,b_dev,res_dev);
	printf("here2\n");
	cudaMemcpy((void*)&res_host,(void*)res_dev,sizeof(int32_t)*NumOfElements,cudaMemcpyDeviceToHost);

	return 0;
}

Would like to know why the segmentation fault occurs.
THANKS

Robert_Crovella · September 1, 2018, 3:21pm

One problem is that there is a limit to the size of stack-based variables:

int32_t a[NumOfElements],b[NumOfElements],res_host[NumOfElements];

when you try to make stack-based variables that are too large, your program will crash.

Instead of large stack-based variables like that, use heap-allocated variables, something like this:

int32_t *a = new int32_t[NumOfElements];

and likewise for b and res_host.

Another problem you will run into is that your cudaMemcpy statements are incorrectly written. Even with your stack based variables, the name of an array decays to a pointer, there is no need to take the address of it:

cudaMemcpy((void*)a_dev,(void*)&a,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
                               ^                
cudaMemcpy((void*)b_dev,(void*)&b,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
                               ^
...
cudaMemcpy((void*)&res_host,(void*)res_dev,sizeof(int32_t)*NumOfElements,cudaMemcpyDeviceToHost);
                  ^

Although this is legal but not necessary in the case of a statically allocated array, it will cause problems when you convert to pointer/heap usage as I suggested. Instead do this:

cudaMemcpy(a_dev,a,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
cudaMemcpy(b_dev,b,sizeof(int32_t)*NumOfElements,cudaMemcpyHostToDevice);
...
cudaMemcpy(res_host,res_dev,sizeof(int32_t)*NumOfElements,cudaMemcpyDeviceToHost);

tajiknomi · September 2, 2018, 9:01am

Thanks for a comprehensive answer. I got it.

Topic		Replies	Views
Beginner at Cuda seg faulting CUDA Programming and Performance	2	489	August 31, 2016
Beginner at Cuda seg faulting CUDA Programming and Performance	0	458	August 31, 2016
problem with "vector sum" example in "CUDA by example" book CUDA Programming and Performance	1	5529	June 24, 2011
Segmentation fault (core dumped) CUDA Programming and Performance	4	13165	May 13, 2017
nvcc Segmentation Fault (addressing issue with stack vars?) local to global gather CUDA Programming and Performance	8	10028	August 15, 2010
Segmentation Fault while adding two arrays CUDA Setup and Installation cuda	0	422	January 4, 2022
problem with vector add example in "CUDA by Example" book CUDA Programming and Performance	5	9495	July 1, 2012
Segmentation faults on increasing input size CUDA Programming and Performance cuda , kernel	2	537	December 21, 2021
Segfault Retrieving Results CUDA Programming and Performance	2	376	January 3, 2021
Seg fault on cudaMemcpy CUDA Programming and Performance	3	1274	February 17, 2017

Why this segmentation fault occur

Related topics