driveworks camera acquisition into tensosrt causing segfault

moodie · February 3, 2020, 5:56pm

I’m working on reducing memory transfers of images from the driveworks image acquisition pipeline to tensorrt on my drive PX2.

My rough pipeline is as follows:

/// on the image grabbing thread
dwSensorCamera_readFrame(…);
dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_CUDA_RGBA_UINT8, handle);
dwImage_copyConvertAsync(buffer_handle, image, stream, context);
dwImage_getCUDA(&cuda_image, buffer_handle);
cudaEventRecord(event, stream);

cudaEventSynchronize(event);

// push cuda_image to another thread

// on the tensorrt thread
preprocess cuda_image into tensorrt input buffer
tensorrt_execution_context->enqueue(input_buffer, output_buffer, …)

With the above pipeline I get a segmentation fault inside the enqueue call, if I do not invoke enqueue, all the preprocessing works correctly on the GPU to prepare the tensorrt input buffers.
However if I split the processes and have the image acquisition running in one process and tensorrt running in a separate process, I have no issues. (IE use gpu → host → IPC → host → gpu in place of pushing the cuda_image to a separate thread)

With driveworks do we need to use the driveworks tensorrt API or can we just use tensorrt by itself?

The tensorrt thread is unaware of driveworks, it is only receiving images from the driveworks thread in the form of RGBA DW_IMAGE_MEMORY_TYPE_PITCH images, which is native to what opencv gpu mat’s use.
The preprocessing of the data works correctly, so I know the images sent between threads is correct. The preprocessing does not occur in place, thus the allocation of buffers for tensorrt’s input is unchanged.

If I allow enqueue to be called, I get a segfault with the following stack trace:

[0x7f203d7340] + 0x375340
[0x7f7ae3e6c0]
[0x7f5c3a0984] + 0x2ac984
[0x7f5c3a0d48] + 0x2acd48
[0x7f5c2ad458] + 0x1b9458
[0x7f5c316298] + 0x222298
[0x7f5c2b367c] + 0x1bf67c
[0x7f5c2b4234] + 0x1c0234
[0x7f5c2b8388] + 0x1c4388
[0x7f5c295bf8] + 0x1a1bf8
[0x7f5c2963bc] + 0x1a23bc
[0x7f5c30f75c] cuEGLStreamConsumerAcquireFrame + 0x78
[0x7f5d15e080] + 0x32c080
[0x7f5cf6bd7c] + 0x139d7c
[0x7f5cf6c828] + 0x13a828
[0x7f5cf68850] + 0x136850
[0x7f5cf6934c] + 0x13734c
[0x7f5cf73674] + 0x141674
[0x7f5cf73ab4] dwSensorCamera_getImage + 0x40

or the following:

[31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuGraphicsResourceGetMappedEglFrame : unspecified launch failure

[ERROR] [1580506143.699987260]: Error occurred getting an image from the sensor frame: DW_CUDA_ERROR
[31-1-2020 13:29:3] Driveworks exception thrown: DW_BAD_CAST: dwImage_copyFormatConvert: images must be of the same type

[ERROR] [1580506143.700175778]: Unable to convert the captured image: DW_BAD_CAST
[31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuEGLStreamConsumerReleaseFrame : unspecified launch failure
what(): (cudaEventRecord(event.get(), stream)==cudaErrorLaunchFailure)

or I get the following from the tensorrt thread:

Cuda error in file src/implicit_gemm.cu at line 1159: unspecified launch failure
Error at line 289: unspecified launch failure
customWinogradConvActLayer.cpp:237: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `convolutions.back().get()’ failed.

or this:

[0x7f38281ab4] + 0xa8ab4
[0x7f8d8246c0]
[0x7f3f45b984] + 0x2ac984
[0x7f3f45bd48] + 0x2acd48
[0x7f3f368458] + 0x1b9458
[0x7f3f478fe8] + 0x2c9fe8
[0x7f3f2967b0] + 0xe77b0
[0x7f3f296908] + 0xe7908
[0x7f3f29694c] + 0xe794c
[0x7f3f3c92a0] cuLaunchKernel + 0xb0
[0x7f8801bc0c] + 0xcc0c

I’ve tried the following variations with no observable difference:
GPU wrapping dwImageCUDA → tensorrt
GPU wrapping dwImageCUDA → user managed GPU → tensorrt
GPU wrapping dwImageCUDA → pinned CPU → tensorrt thread → GPU → tensorrt
CPU wrapping dwImageCPU → tensorrt thread → GPU → tensorrt
CPU wrapping dwImageNvMedia → tensorrt thread → GPU → tensosrt

Software versions:
driveworks: v.1.2.400-drive-linux-5.0.10.3
cudnn: 7_7.1.2.23-1+cuda9.2
nvinfer: 4.1.1-1+cuda9.2

Edit

The original code that I’m replacing did the following:

dwSensorCamera_readFrame
dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_NATIVE_PROCESSED, …)
dwImage_copyConvert(buffer_handle, image, m_context);
//push onto queue for a conversion thread
// on conversion thread
dwImage_getNvMedia
NvMediaImageLock
// wrap surface and copy into pageable host memory
NvMediaImageUnlock
// push host memory image to tensorrt thread
// upload to GPU, preprocess, and infer.

SivaRamaKrishnaNV · February 5, 2020, 4:56am

Dear moodie,
Driveworks APIs makes use of EGLStream APIs to transfer frames across different APIs. It seems the issue could be with synchronization. If you use our Driveworks DNN APIs to perform inference, these kind of issues handled internally. Do you see any issue with using DW DNN framework APIs to use your model?

moodie · February 10, 2020, 9:02pm

Hello and thank you for your reply,
Since your response I’ve been working on refactoring everything to test using the driveworks tensorrt API.
Unfortunately I’ve noticed something and I’m a little afraid all my efforts would be worthless.
In the documentation it reads that I must use the file created by the TensorRT_optimization tool, furthermore the docs for this tool list that it only works for caffe and uff models. I wrote a tensorrt converter for mxnet specifically for my model and thus this might be a bit of an issue.

Will I have to convert my model to caffe or uff format to get this to work? Or does the dwDNN_initializeTensorRTFromMemory call allow me to initialize off of a binary serialized tensorrt file similar to the tensorrt api?

SivaRamaKrishnaNV · February 11, 2020, 10:26am

Dear moodie,
Yes. Driveworks requires model to be generated using tensort_optimization tool. The last release for DRIVE PX2 support only uff and caffe formats. Currently, we are targetting releases for DRIVE AGX platform. Consider upgrading to DRIVE AGX platform to get more features and updates on our SW stack.

Topic		Replies	Views
Driveworks inference with TRT engine generated from UFF DriveWorks	16	2236	November 15, 2018
Using TLT model in Driveworks for inference from camera or video TAO Toolkit driveworks	11	1099	October 12, 2021
Error Running Driveworks TensorRT Samples on Host DRIVE AGX Xavier General	6	1436	October 12, 2021
Error in applying tensorrt_optimized model in driveworks-0.6 ./sample_object_detector DriveWorks	0	657	May 23, 2018
DriveWorks camera_gmsl_ram&tensorRT&SSD&PX2 , result is error DriveWorks	6	2047	August 3, 2019
GMSL camera to TensorRT SSD DriveWorks	2	573	October 12, 2021
TensorRT Engine Inference General	2	626	June 1, 2018
Using custom TensorRT model on sample applications DRIVE AGX Xavier General driveworks-dnn-framework	4	1295	March 22, 2022
Driveworks API in Python DriveWorks	5	1262	May 23, 2018
DriveWorks thread safety DriveWorks	19	3385	September 6, 2018

driveworks camera acquisition into tensosrt causing segfault

[ERROR] [1580506143.700175778]: Unable to convert the captured image: DW_BAD_CAST [31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuEGLStreamConsumerReleaseFrame : unspecified launch failure what(): (cudaEventRecord(event.get(), stream)==cudaErrorLaunchFailure)

[0x7f38281ab4] + 0xa8ab4 [0x7f8d8246c0] [0x7f3f45b984] + 0x2ac984 [0x7f3f45bd48] + 0x2acd48 [0x7f3f368458] + 0x1b9458 [0x7f3f478fe8] + 0x2c9fe8 [0x7f3f2967b0] + 0xe77b0 [0x7f3f296908] + 0xe7908 [0x7f3f29694c] + 0xe794c [0x7f3f3c92a0] cuLaunchKernel + 0xb0 [0x7f8801bc0c] + 0xcc0c

Related topics

[ERROR] [1580506143.700175778]: Unable to convert the captured image: DW_BAD_CAST
[31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuEGLStreamConsumerReleaseFrame : unspecified launch failure
what(): (cudaEventRecord(event.get(), stream)==cudaErrorLaunchFailure)

[0x7f38281ab4] + 0xa8ab4
[0x7f8d8246c0]
[0x7f3f45b984] + 0x2ac984
[0x7f3f45bd48] + 0x2acd48
[0x7f3f368458] + 0x1b9458
[0x7f3f478fe8] + 0x2c9fe8
[0x7f3f2967b0] + 0xe77b0
[0x7f3f296908] + 0xe7908
[0x7f3f29694c] + 0xe794c
[0x7f3f3c92a0] cuLaunchKernel + 0xb0
[0x7f8801bc0c] + 0xcc0c