driveworks camera acquisition into tensosrt causing segfault

I’m working on reducing memory transfers of images from the driveworks image acquisition pipeline to tensorrt on my drive PX2.

My rough pipeline is as follows:

/// on the image grabbing thread
dwSensorCamera_readFrame(…);
dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_CUDA_RGBA_UINT8, handle);
dwImage_copyConvertAsync(buffer_handle, image, stream, context);
dwImage_getCUDA(&cuda_image, buffer_handle);
cudaEventRecord(event, stream);

cudaEventSynchronize(event);

// push cuda_image to another thread

// on the tensorrt thread
preprocess cuda_image into tensorrt input buffer
tensorrt_execution_context->enqueue(input_buffer, output_buffer, …)

With the above pipeline I get a segmentation fault inside the enqueue call, if I do not invoke enqueue, all the preprocessing works correctly on the GPU to prepare the tensorrt input buffers.
However if I split the processes and have the image acquisition running in one process and tensorrt running in a separate process, I have no issues. (IE use gpu -> host -> IPC -> host -> gpu in place of pushing the cuda_image to a separate thread)

With driveworks do we need to use the driveworks tensorrt API or can we just use tensorrt by itself?

The tensorrt thread is unaware of driveworks, it is only receiving images from the driveworks thread in the form of RGBA DW_IMAGE_MEMORY_TYPE_PITCH images, which is native to what opencv gpu mat’s use.
The preprocessing of the data works correctly, so I know the images sent between threads is correct. The preprocessing does not occur in place, thus the allocation of buffers for tensorrt’s input is unchanged.

If I allow enqueue to be called, I get a segfault with the following stack trace:


[0x7f203d7340] + 0x375340
[0x7f7ae3e6c0]
[0x7f5c3a0984] + 0x2ac984
[0x7f5c3a0d48] + 0x2acd48
[0x7f5c2ad458] + 0x1b9458
[0x7f5c316298] + 0x222298
[0x7f5c2b367c] + 0x1bf67c
[0x7f5c2b4234] + 0x1c0234
[0x7f5c2b8388] + 0x1c4388
[0x7f5c295bf8] + 0x1a1bf8
[0x7f5c2963bc] + 0x1a23bc
[0x7f5c30f75c] cuEGLStreamConsumerAcquireFrame + 0x78
[0x7f5d15e080] + 0x32c080
[0x7f5cf6bd7c] + 0x139d7c
[0x7f5cf6c828] + 0x13a828
[0x7f5cf68850] + 0x136850
[0x7f5cf6934c] + 0x13734c
[0x7f5cf73674] + 0x141674
[0x7f5cf73ab4] dwSensorCamera_getImage + 0x40

or the following:


[31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuGraphicsResourceGetMappedEglFrame : unspecified launch failure

[ERROR] [1580506143.699987260]: Error occurred getting an image from the sensor frame: DW_CUDA_ERROR
[31-1-2020 13:29:3] Driveworks exception thrown: DW_BAD_CAST: dwImage_copyFormatConvert: images must be of the same type

[ERROR] [1580506143.700175778]: Unable to convert the captured image: DW_BAD_CAST
[31-1-2020 13:29:3] Driveworks exception thrown: DW_CUDA_ERROR: Call failed cuEGLStreamConsumerReleaseFrame : unspecified launch failure
what(): (cudaEventRecord(event.get(), stream)==cudaErrorLaunchFailure)

or I get the following from the tensorrt thread:


Cuda error in file src/implicit_gemm.cu at line 1159: unspecified launch failure
Error at line 289: unspecified launch failure
customWinogradConvActLayer.cpp:237: virtual void nvinfer1::cudnn::WinogradConvActLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `convolutions.back().get()’ failed.

or this:


[0x7f38281ab4] + 0xa8ab4
[0x7f8d8246c0]
[0x7f3f45b984] + 0x2ac984
[0x7f3f45bd48] + 0x2acd48
[0x7f3f368458] + 0x1b9458
[0x7f3f478fe8] + 0x2c9fe8
[0x7f3f2967b0] + 0xe77b0
[0x7f3f296908] + 0xe7908
[0x7f3f29694c] + 0xe794c
[0x7f3f3c92a0] cuLaunchKernel + 0xb0
[0x7f8801bc0c] + 0xcc0c

I’ve tried the following variations with no observable difference:
GPU wrapping dwImageCUDA -> tensorrt
GPU wrapping dwImageCUDA -> user managed GPU -> tensorrt
GPU wrapping dwImageCUDA -> pinned CPU -> tensorrt thread -> GPU -> tensorrt
CPU wrapping dwImageCPU -> tensorrt thread -> GPU -> tensorrt
CPU wrapping dwImageNvMedia -> tensorrt thread -> GPU -> tensosrt

Software versions:
driveworks: v.1.2.400-drive-linux-5.0.10.3
cudnn: 7_7.1.2.23-1+cuda9.2
nvinfer: 4.1.1-1+cuda9.2


Edit

The original code that I’m replacing did the following:

dwSensorCamera_readFrame
dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_NATIVE_PROCESSED, …)
dwImage_copyConvert(buffer_handle, image, m_context);
//push onto queue for a conversion thread
// on conversion thread
dwImage_getNvMedia
NvMediaImageLock
// wrap surface and copy into pageable host memory
NvMediaImageUnlock
// push host memory image to tensorrt thread
// upload to GPU, preprocess, and infer.

Dear moodie,
Driveworks APIs makes use of EGLStream APIs to transfer frames across different APIs. It seems the issue could be with synchronization. If you use our Driveworks DNN APIs to perform inference, these kind of issues handled internally. Do you see any issue with using DW DNN framework APIs to use your model?

Hello and thank you for your reply,
Since your response I’ve been working on refactoring everything to test using the driveworks tensorrt API.
Unfortunately I’ve noticed something and I’m a little afraid all my efforts would be worthless.
In the documentation it reads that I must use the file created by the TensorRT_optimization tool, furthermore the docs for this tool list that it only works for caffe and uff models. I wrote a tensorrt converter for mxnet specifically for my model and thus this might be a bit of an issue.

Will I have to convert my model to caffe or uff format to get this to work? Or does the dwDNN_initializeTensorRTFromMemory call allow me to initialize off of a binary serialized tensorrt file similar to the tensorrt api?

Dear moodie,
Yes. Driveworks requires model to be generated using tensort_optimization tool. The last release for DRIVE PX2 support only uff and caffe formats. Currently, we are targetting releases for DRIVE AGX platform. Consider upgrading to DRIVE AGX platform to get more features and updates on our SW stack.