cudaMalloc() leads to segment fault

When my codes were allocating GPU memory, it causes segment fault.

Can you provide any thoughts?

--------------------this is the snippet of codes:
class update_buffer{
update_buffer(int Id);

int devId;
unsigned char* buffer;
unsigned char* cudaSrc;
unsigned char* cudaDst;
… …
update_buffer::update_buffer(int Id):devId(Id),buffer(NULL)
buffer = (unsigned char*)malloc(7205763);
cudaError_t cudaStatus;

cudaStatus = cudaMalloc((void**)&cudaSrc, 576*1440);
if (cudaStatus != cudaSuccess) {
    printf("cudaMalloc failed! %d\n",cudaStatus);
cudaStatus = cudaMalloc((void**)&cudaDst, 576*720*3);
if (cudaStatus != cudaSuccess) {
    fprintf(stderr, "cudaMalloc failed!");


------------below is gdb output:
Thread 2 “StlTextureO_rea” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb3a181b0 (LWP 17273)]
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000007fb25d7f3c in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#2 0x0000007fb26807c4 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#3 0x0000007fb2680910 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#4 0x0000007fb267ebc0 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#5 0x0000007fb267ee00 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#6 0x0000007fb25c6b84 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#7 0x0000007fb25c8658 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#8 0x0000007fb238b518 in ?? ()
from /usr/lib/aarch64-linux-gnu/tegra/
#9 0x0000007fb263ee34 in cuDevicePrimaryCtxRetain ()
from /usr/lib/aarch64-linux-gnu/tegra/
#10 0x00000000004828dc in cudart::contextStateManager::initPrimaryContext(cudart::device*) ()
#11 0x0000000000482b44 in cudart::contextStateManager::initDriverContext() ()
#12 0x0000000000483598 in cudart::contextStateManager::getRuntimeContextState(cu—Type to continue, or q to quit—
dart::contextState**, bool) ()
#13 0x0000000000478674 in cudart::doLazyInitContextState() ()
#14 0x000000000045fd18 in cudart::cudaApiMalloc(void**, unsigned long) ()
#15 0x000000000048fe38 in cudaMalloc ()
#16 0x000000000043d3a4 in update_buffer::update_buffer (this=0x7fac001010,
Id=0) at …/updatebuffer.cpp:21
#17 0x00000000004104bc in CaptureGroup::GetPanoCaptureGroup ()
at …/CaptureGroup.cpp:168
#18 0x0000000000410770 in CaptureGroup::GetExtCaptureGroup ()
at …/CaptureGroup.cpp:198
#19 0x000000000043cda8 in thread_scanner () at …/scanner.cpp:109
#20 0x0000007fb716afb4 in start_thread (arg=0x43ccdc <thread_scanner(void*)>)
at pthread_create.c:335
#21 0x0000007fb6e76390 in thread_start ()
at …/sysdeps/unix/sysv/linux/aarch64/clone.S:89

One possible cause could be running out of available memory (On GPU side, no swap is available).
If it is the case, you may see some traces of this in dmesg output.

Be sure to free memory as soon as it is no longer needed. You can also check available memory before allocating to see if enough bytes are available.

I also notice that your code doesn’t check the result of malloc. If malloc fails, it returns a NULL pointer.
Trying to access the buffer with NULL address will make a seg fault.

If dmesg output is useful try this while running the test code (continuous dmesg output as it happens):

sudo dmesg --follow

Below is the dmesg prints, What can it tell us?

[ 6757.201574] StlTextureO_rea[7932]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x83000006
[ 6757.201583] pgd = ffffffc0a7589000
[ 6757.205298] [00000000] *pgd=0000000100a1f003, *pmd=0000000000000000

[ 6757.211638] CPU: 3 PID: 7932 Comm: StlTextureO_rea Not tainted 3.10.96+ #135
[ 6757.211643] task: ffffffc0869306c0 ti: ffffffc0820c0000 task.ti: ffffffc0820c0000
[ 6757.211649] PC is at 0x0
[ 6757.211652] LR is at 0x7fac8daf3c
[ 6757.211656] pc : [<0000000000000000>] lr : [<0000007fac8daf3c>] pstate: 60000000
[ 6757.211659] sp : 0000007fadd19cb0
[ 6757.211662] x29: 0000007fadd1a570 x28: 000000000000000c
[ 6757.211668] x27: 0000007fad3d6000 x26: 0000007fad3d7000
[ 6757.211674] x25: 000000000234dcd0 x24: 0000007fad067000
[ 6757.211679] x23: 000000000234f788 x22: 000000000234da60
[ 6757.211684] x21: 0000007fadd19e00 x20: 0000007fadd19dd0
[ 6757.211689] x19: 0000007fadd19e80 x18: 0000000000000001
[ 6757.211694] x17: 0000007fb1125d18 x16: 0000007fad342280
[ 6757.211698] x15: 0000000000000028 x14: 50b0000000070f00
[ 6757.211703] x13: 0000007fad114040 x12: 0000000000000001
[ 6757.211708] x11: 0000000000000008 x10: 0101010101010101
[ 6757.211713] x9 : 00000000000006e0 x8 : 0000007fa844a6e8
[ 6757.211717] x7 : 0000000000000000 x6 : 000000000000003f
[ 6757.211722] x5 : 0000007fadd19ee8 x4 : 0000000000000000
[ 6757.211727] x3 : 0000000000000001 x2 : 0000000000000004
[ 6757.211731] x1 : 0000007fadd19e70 x0 : 0000007fadd19e00

[ 6757.211745] Library at 0x0: 0x400000 /home/ubuntu/Release/StlTextureO_realCap
[ 6757.219067] Library at 0x7fac8daf3c: 0x7fac55b000 /usr/lib/aarch64-linux-gnu/tegra/
[ 6757.228001] vdso base = 0x7fb22fd000

There’s a NULL pointer dereference once kernel code is reached. Other than that I couldn’t tell you anything specific (@Honey_Patouceul mentioned a malloc which did not get a return value check…possibly this is the source of the NULL pointer).

In the gdb stack frame you gave the top-most part of the call which is still controlled by your application has this:

cudart::contextStateManager::initPrimaryContext(cudart::device*) ()

…I’d have to guess the device argument (or a member of the device if the device itself is not NULL) is not valid. The reason the error shows up in a kernel message (instead of your gdb backtrace) is because the NULL pointer dereference was not in the user application…the dereference took place in the kernel after going through (also as a pointer which did not try to dereference, but instead passed on). Make sure cudart::device* is non-NULL, and that any member needing to be initialized in cudart::device is non-NULL.

It is hard to tell much with so few information about the context.
Can you tell :

  • How many previous calls have succeeded before failing ?
  • How many threads are running in this application ?
  • Do they share buffers, and if yes what are the locking mechanisms ?
  • Same if you have several processes sharing memory.
  • Do you know how much memory is available before launching your app ?
  • Do you know how much should be the maximum that your app could allocate/use ?
  • Are you using recursive calls ?

One possibility could be a stack trashed by another thread or that failed to grow correctly.
Maybe -fstack-check and -fstack-protector flags for gcc can help to detect that.

Could you isolate a skeleton of your code triggering this fault that you could share ?

Have you tried using cuda-memcheck ?


I have tested your code and all good without segmentation fault. Could you help us to check it again?

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

int main(int argc, char ** argv)
    unsigned char* buffer;
    unsigned char* cudaSrc;
    unsigned char* cudaDst;

    buffer = (unsigned char*)malloc(720*576*3);
    cudaError_t cudaStatus;

    buffer[0] = 0;

    cudaStatus = cudaMalloc((void**)&cudaSrc, 576*1440);
    if (cudaStatus != cudaSuccess) {
        printf("cudaMalloc failed! %d\n",cudaStatus);
        return -1;

    cudaStatus = cudaMalloc((void**)&cudaDst, 576*720*3);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMalloc failed!");
        return -1;

    printf("ALL GOOD!\n");
    return 0;
nvcc -o test

Hi HooverLv,

Has this issue been clarified and resolved?
Any further information could be shared?


Just a note that I just struggled finding a very similar SIGSEGV problem when calling cudaHostMalloc() on a Jetson Tx1 with essentially a identical stack trace that I eventually identified as caused by calling cudaGLSetGLDevice() in my initialization code (or at least the crash went away when I removed the call).

The code sequence was essentially:

cudaDev = 0; openGLDev = 0;

CHECKCUDA( cudaSetDevice(cudaDev) );
CHECKCUDA( cudaGetDeviceProperties(&cudaDevProp, cudaDev) );
CHECKCUDA( cudaSetDeviceFlags(cudaDeviceMapHost) );
// assert(cudaDevProp.canMapHostMemory);
CHECKCUDA( cudaGLSetGLDevice(openglDev) );

CHECKCUDA( cudaMallocHost((void**)&cudaHostPinned, totHostPinnedBytes ) );

I now recognize that cudaGLSetGLDevice() is a deprecated interface, but I was reusing a older class and the call returned no error and the cudaHostMalloc() actually occurred some distance away in code space so it took me a while to isolate.

This problem was observed on a Jetson Tx1 running L4T 24.2 and CUDA Version 8.0.34.

Perhaps this will help someone else down the road.

Hi jrecker,

Thanks for the feedback.

Just want to confirm:
Alougth you met error at cudaMallocHost(), the real cause is some function needs to be applied before calling the cudaGLSetGLDevice().

Is my understanding correct?