Jetson Nano PoseNet Process Causes Segfault

mjasner · January 12, 2023, 9:30pm

I’m porting some code from Python to C/C++ and I’m running into an issue. The python code is multithreaded. In the main thread the code captures an image from a VideoSource object and then passes it to one or more worker threads, each doing a different kind of processing on the image. The first thread I’m porting runs the captured image through a PoseNet object for pose detection. The python code works fine, but in C/C++ whenever I call Process on the PoseNet object I get a segmentation fault.

I ran it through gdb and got the following stack trace from the crash:
Thread 14 “uatu_george” received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f6111fa10 (LWP 12272)]
0x0000007fb7da0e2c in tensorNet::PROFILER_BEGIN(profilerQuery) ()
from /usr/local/lib/libjetson-inference.so
(gdb) where
#0 0x0000007fb7da0e2c in tensorNet::PROFILER_BEGIN(profilerQuery) ()
at /usr/local/lib/libjetson-inference.so
#1 0x0000007fb7dacab0 in poseNet::Process(void*, unsigned int, unsigned int, imageFormat, std::vector<poseNet::ObjectPose, std::allocatorposeNet::ObjectPose >&, unsigned int) () at /usr/local/lib/libjetson-inference.so
#2 0x0000005555562fd0 in poseNet::Process(float4*, unsigned int, unsigned int, std::vector<poseNet::ObjectPose, std::allocatorposeNet::ObjectPose >&, unsigned int) (this=0x0, image=0x100e60000, width=1280, height=720, poses=std::vector of length 0, capacity 0, overlay=4)
at /usr/local/include/jetson-inference/poseNet.h:230
#3 0x0000005555562874 in clsWorkerPoseDetection::execute() (this=0x55555e2060 )
at /home/marc/src/george/device_jetson/cpp/src/workerPoseDetection.cpp:111
#4 0x00000055555623c8 in baseEntry(void*) (arg=0x55555e2060 )
at /home/marc/src/george/device_jetson/cpp/src/workerBase.cpp:5
#5 0x0000007fb7f6b088 in start_thread (arg=0x7fffffeb8f)
at pthread_create.c:463
#6 0x0000007fb77770cc in thread_start ()
at …/sysdeps/unix/sysv/linux/aarch64/clone.S:78

Is there something special I have to do to make this work in multiple threads? I thought perhaps the float4* wasn’t safe to use between threads so I tried cudaMemcpy() to copy it into a buffer owned by the thread object but that didn’t work either.

I’m a little confused on how to proceed.

AastaLLL · January 13, 2023, 3:25am

Hi,

Segmentation fault might occurs due to some invalid memory access.

Would you mind sharing the source before and after the TensorRT with us?
Could you also check the memory copy is done before the inference?

Thanks.

mjasner · January 13, 2023, 3:39am

Of course. Thanks for the reply.

The main loop looks like this:

int main( int argc, char** argv )
{
        videoSource* inStream;
        glDisplay* display;
        float4* frame = NULL;

        //Register signal handler
        if( signal(SIGINT, signalHandler)==SIG_ERR )
                printf("ERROR: Unable to register signal handler\n");

        //Create videoSource object
        inStream = videoSource::Create(argv[1]);
        if( !inStream )
        {
                printf("ERROR: Failed to create videoSource object\n");
                return -1;
        }

        //Set worker resolution before starting it
        worker.setResolution(inStream->GetHeight(), inStream->GetWidth());

        pthread_t t;
        pthread_create(&t, NULL, baseEntry, (void*)&worker);

        //Main body loop
        while( runFlag==true )
        {
                if( !inStream->Capture(&frame, 1000) )
                {
                        if( !inStream->IsStreaming() )
                        {
                                printf("Input Stream EOS Detected\n");
                                break;
                        }

                        printf("ERROR: Failed to capture next frame\n");
                        continue;
                }

                worker.addToQueue(frame);

        }
        printf("Exiting\n");

        //Cleanup
        SAFE_DELETE(inStream);
        return 0;
}

And the execute() loop in the worker thread looks like this:

void clsWorkerPoseDetection::execute() {
        float4* frame=NULL;

        //Allocate image buffer
        if( !cudaAllocMapped(&imgBuf, width ,height) )
        {
                printf("ERROR: Error allocating image buffer\n");
                imgBuf=NULL;
        }

        //Run until commanded to stop
        while(RunFlag)
        {
                if( inQ.size()>0 )
                {
                        frame=NULL;
                        //Get frame from queue
                        printf("Worker %s: inQ size = %lu\n", name, inQ.size());
                        pthread_mutex_lock(&mutex);
                        frame=inQ.front();
                        inQ.pop();
                        pthread_mutex_unlock(&mutex);

                        //If we got a frame, then process it
                        std::vector<poseNet::ObjectPose> poses;
                        if( !pNet->Process(imgBuf, width, height, poses, overlayFlags) )
                        {
                                printf("ERROR: Failed to execute posenet on captured frame\n");
                                continue;
                        }
                }
                else
                {
                        sleep(.01);
                        continue;
                }
        }
}

imgBuf is defined as a float4* and it belongs to the class that executes the thread. inQ is a std:queue<float4*>.

AastaLLL · January 19, 2023, 4:32am

Hi,

Which camera library do you use?
OpenCV or MMAPI?

Thanks.

dusty_nv · January 19, 2023, 3:17pm

Hi @mjasner, that call to PROFILE_BEGIN() is the first thing really run inside poseNet::Process() - are you sure your poseNet object is valid? I don’t see where pNet pointer is created at.

mjasner · January 19, 2023, 3:27pm

I’m using the videoSource object from jetson_inference. Whatever the default there is, as I haven’t changed anything.

the pNet pointer is allocated in the constructor for the worker thread class as follows:

clsWorkerPoseDetection::clsWorkerPoseDetection(const char* strName)
{
        //Clear members
        memset(name,0,WORKERBASE_NAME_LENGTH);
        RunFlag=true;

        //Set worker name
        strcpy(name, strName);

        //Initialize mutex object
        if (pthread_mutex_init(&mutex, NULL) != 0)
        {
                printf("Worker %s: mutex init failed\n", name);
        }

        //Allocate PoseNet object
        poseNet* net = poseNet::Create(); //TODO: Look into arguments
        if( !net )
        {
                printf("ERROR: Unable to create pose detection network object\n");
        }

        //Define overlay flags
        overlayFlags = poseNet::OverlayFlagsFromStr("overlay links,keypoints");

        //Set default resolution
        height=720;
        width=1280;

}

mjasner · January 19, 2023, 3:32pm

Well it turns out I’m apparently an idiot. The pNet pointer is allocated in the constructor but sure enough it’s not working and so the pNet pointer is NULL. I’ll fix that and hide my head in shame at missing the obvious… sorry about that. Thanks for all the help.

dusty_nv · January 19, 2023, 3:32pm

mjasner:

        poseNet* net = poseNet::Create(); //TODO: Look into arguments
        if( !net )
        {
                printf("ERROR: Unable to create pose detection network object\n");
        }

I don’t see where that sets the pNet pointer, it sets a local variable (net) that goes out of scope after the constructor ends. Regardless, these multi-threaded issues are difficult to debug from just looking at the source - I would recommend stripping things down, starting with running things as single-threaded and slowly introducing the threaded aspects or add additional logging to try and detect the issues.

mjasner · January 19, 2023, 3:32pm

Yep, when I moved things into the class structure I made copy/paste mistakes… Sorry. When I was debugging I checked EVERY pointer object apparently EXCEPT pNet. shakes head sadly

dusty_nv · January 19, 2023, 3:36pm

OK, no worries at all! A couple of other things I noticed:

you are processing imgBuf with poseNet, but presumably that is a blank image buffer because it’s never copied over from the input frame. I think it should just be able to process the input frame directly without needing imgBuf
the frames that come from videoSource are stored in a ringbuffer, so those get re-used. If necessary you can increase the number of frames in the ringbuffer in the videoOptions struct. Otherwise if your processing takes too long, videoSource can begin overwriting a frame while you are still using it (IIRC the default number of buffers is 4)
I think overlayFlags = poseNet::OverlayFlagsFromStr("overlay links,keypoints"); should be overlayFlags = poseNet::OverlayFlagsFromStr("links,keypoints"); instead

mjasner · January 19, 2023, 3:38pm

Thanks for the tips. The imgBuf thing is because initially I thought there was some issue processing the float4* and so I was going to copy the float4 into it. I realized that wasn’t the issue so stopped there. I have cleanup work to do.

Thanks for the tips.

system · February 21, 2023, 8:02am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.