Segfault Retrieving Results

rs277 · August 17, 2020, 4:42am

I have a couple of kernels, each of which write results to Sout and printf’s within the kernels confirm things are operating correctly, no problems.

If I uncomment the line “cudaMemcpy(Sout_h, Sout_d, 0x4000000, cudaMemcpyDeviceToHost);” to get the results back, it compiles and cuda-memchecks without error, but when executed, there is a seg fault in main().

I am very new to debugging - valgrind --tool=memcheck produces:

==4003== Warning: client switching stacks? SP change: 0x1fff000348 → 0x1ffb000300
==4003== to suppress, use: --max-stackframe=67108936 or greater
==4003== Invalid write of size 8
==4003== at 0x403216: main ()
==4003== Address 0x1ffb000320 is on thread 1’s stack
==4003==
==4003==
==4003== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==4003== Access not within mapped region at address 0x1FFB000320
==4003== at 0x403216: main ()

I’m baffled by the reference to “Invalid write of size 8”, as I am not using any 64 bit values.

As I get stuck into learning gdb, I wondered if anyone can see a glaring error here?

Many thanks.

int main(void){

    uint8_t A_h[8] = {
    0xBA, 0x3F, 0xD8, 0xD1, 0x2A, 0xEE, 0x14, 0x2C 
    };      
    uint8_t *A_d;
                    
    uint8_t B_h[12] = {
    0xA8, 0x3F, 0x28, 0x41, 0xC9, 0xFE, 0x74, 0xFD, 0xB7, 0x8A, 0x27, 0x00
    };      
    uint8_t *B_d;

    uint32_t Sout_h[0x1000000];                        
    uint32_t *Sout_d;
                    
    //alloc device memory
    cudaMalloc((void **) &A_d, 8);
    cudaMalloc((void **) &B_d, 12);
    cudaMalloc((void **) &Sout_d, 0x4000000); // 64MB
                    
            
    //Copy array
    cudaMemcpy(A_d, A_h, 8, cudaMemcpyHostToDevice);
    cudaMemcpy(B_d, B_h, 12, cudaMemcpyHostToDevice);  
            
    kern1<<<BLOCKS,THREADS,0,0>>>(A_d, B_d, Sout_d);
    kern2<<<BLOCKS,THREADS,0,0>>>(A_d, B_d, Sout_d);
                    
    cudaDeviceSynchronize();

// cudaMemcpy(Sout_h, Sout_d, 0x4000000, cudaMemcpyDeviceToHost);

    //free device memory
    cudaFree(A_d);
    cudaFree(B_d);
    cudaFree(Sout_d);
                    
    return 0;

}

Robert_Crovella · January 3, 2021, 7:06am

That creates a “large” stack-based array. I would generally avoid that. If you make it large enough (it may be large enough already) you’re going to run out of stack space. Instead, switch to a dynamic allocation using e.g. new or malloc

rs277 · January 3, 2021, 7:51pm

Thanks Robert.

Topic		Replies	Views
How to track down a Segmentation Fault in Big Programs CUDA Programming and Performance	5	1832	January 26, 2011
Allocating array pointers segfaults What am I doing wrong? CUDA Programming and Performance	6	1764	April 21, 2009
How to allocate memory for host in cuda main function? CUDA Programming and Performance	5	299	July 28, 2023
Beginner at Cuda seg faulting CUDA Programming and Performance	2	436	August 31, 2016
Invalid __global__ write of size 4. Need help with debugging CUDA Programming and Performance cuda	3	974	October 12, 2021
Beginner at Cuda seg faulting CUDA Programming and Performance	0	426	August 31, 2016
CUDA C++ Segmentation Fault CUDA Programming and Performance	7	14761	October 1, 2017
cudaError at memory locat.... HELP CUDA Programming and Performance	7	22633	December 9, 2009
cudaFree is returning an unrecognised error code CUDA Programming and Performance	10	7946	March 13, 2009
Cuda unknown error on the shaded line.can anyone help? CUDA Programming and Performance	4	1264	March 12, 2015

Segfault Retrieving Results

Related topics