YUV420PlanarToRGBAKernel executing on default stream

Hello,

I’m working to optimize my camera driver using the driveworks API for pulling images for my PX2 gmsl deserializer.

I’ve noticed that dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_CUDA_RGBA_UINT8, …) executes the YUV420PlanarToRGBAKernel on the default stream, I then use dwImage_copyConvertAsync to pull that image out into user space allocated memory.

I’ve tried using dwSensorCamera_getImage(&image, DW_CAMERA_OUTPUT_NATIVE_PROCESSED …) and then copying into a RGBA image with dwImage_copyConvertAsync and that gives me a DW_BAD_CAST. I’m assuming OUTPUT_NATIVE_PROCESSED is not a cuda image and thus the BAD_CAST.

What is the most efficient way of getting an image into a user managed RGB or RGBA buffer?

Dear moodie,
As I understand, you want to get CUDA image buffer of camera output to perform some operation.
If so, you can check sample_object_dectector sample code. The variable rgbaImage contains dwImageCUDA object. You can access CUDA buffer pointer from the dwImageCUDA structure for furthur processing of image.

Hello moodle,
Adding to sivaramakrishna’s suggestion:
I assume that when writing “user space allocated memory” you mean CPU allocated memory.
dwImage_copyConvertAsync is meant for copy from CUDA or NVMEDIA memory type, that is why maybe you get an error DW_BAD_CAST.

the most efficient way to get the image to a CPU memory is using dwImage_getCUDA after calling to dwSensorCamera_getImage with the requested format and then use cudamemcpy to a CPU memory and then returning the image to the image handle.

Hello and thank you for your replies.

I originally used dwImage_getCUDA on the RGBA image that I retrieved from dwSensorCamera_getImage.

What I found was dwSensorCamera_getImage with DW_CAMERA_OUTPUT_CUDA_RGBA_UINT8 was executing the YUV420PlanarToRGBA conversion kernel on the default stream, which then gave me a RGBA image that I could extract with dwImage_getCUDA. The problem is the execution of that conversion kernel on the default stream which ends up not executing during the forward pass of my cnn. I’d like for this kernel to be invoked on a separate stream.

Dear dtmoodie,
Just to clarify, YUV420PlanarToRGBA kernel gets called by dwImage_copyConvertAsync() function. This function has cudastream parameter on which YUV420PlanarToRGBA gets called.
Could you tell what is your use case so that we can guide you better.

Hi dtmoodie,

just a small correction on your understanding.
the dwSensorCamera_getImage is not executing the conversion kernel on the default stream it executes on the sensors assigned stream through api dwSensorCamera_setCUDAStream. the same stream is used for handling any GPU execution related to data from that camera - in this case, the conversion of the image format.
in case you want for any reason you think it makes sense, you can set a different stream for the sensor at any stage of the flow using that API.