Converting from YV12 to RGBA within jetson-inference

Hi

I must apologise first as I am an educational user who is learning to code both generally and - at the moment - on the TX2, so please forgive my perhaps overlooking certain obvious solutions.

I have been able to get a gst camera object from my kinect v2 colour image stream by using another program (ros-virtual-cam), which creates a stream on dev/video1 via kinect2_bridge. I can view the gst stream ok.

I want to pass it to jetson-inference, but if I understand correctly it is not currently set up to use YV12 pixel format (the format of the kinect v2). There are some conversion utilities in cudaYUV.h, but not one that seems to convert from YV12 to RGBA.

At the moment when I try to run jetson-inference I get an error saying it could not convert from NV12, this is why I am assuming it is defaulting to the format of the default camera stream.

Is there a function that can go from YV12 to RGBA? If you could provide steps it would be enormously appreciated, my inexperience makes trying to find the relevant sections in the source code quite a challenge.

Many thanks for reading

Hi,

Maybe you can use GStreamer to convert the Kinect frame into YV12:
For example, use this videoconvert component:
[url]https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-plugins/html/gst-plugins-base-plugins-videoconvert.html[/url]

You can try it on console first and update the command here:
[url]https://github.com/dusty-nv/jetson-utils/blob/b337733d4f1000b87979cb982ced30445d42b5b3/camera/gstCamera.cpp#L359[/url]

Thanks.

Hi

Thank you for replying, however I am still stuck! I am not trying to convert to YV12, I am just trying to give jetson-inference any format its happy with and YV12 just happens to be one of the potential input formats I have access to. Maybe if I describe the problem a little more that will help:

I am using a program which turns the topic from kinect2_bridge into a dev/video1 v4l2 stream. It can produce BGR3, RGB3, GREY, YV12, YUYV pixel formats depending what I choose. If I run it with YUYV and HD resolution options, and then do

v4l2-ctl -d /dev/video1 -V

I get:

$ v4l2-ctl -d /dev/video1 -V
Format Video Capture:
	Width/Height      : 1920/1080
	Pixel Format      : 'YUYV'
	Field             : None
	Bytes per Line    : 3840
	Size Image        : 4147200
	Colorspace        : sRGB
	Transfer Function : Default
	YCbCr Encoding    : Default
	Quantization      : Default
	Flags             :

If I run

gst-launch-1.0 v4l2src device=/dev/video1 ! xvimagesink

I can view the stream. I have experimented with all the formats except grey, and a range of resolutions (1280 x 720, 1920 x 1080, 640 x 480).

When I run jetson-inference, it initially says that it has opened the camera stream but then fails. Here is the terminal output:

$ ./imagenet-camera imagenet-camera
  args (1):  0 [./imagenet-camera]  

[gstreamer] initialized gstreamer, version 1.8.3.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVCAMERA
[gstreamer] gstCamera pipeline string:
v4l2src device=/dev/video1 ! video/x-raw, width=(int)1280, height=(int)720, format=RGB ! videoconvert ! video/x-raw, format=RGB ! videoconvert !appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_V4L2

imagenet-camera:  successfully initialized video device
    width:  1280
   height:  720
    depth:  24 (bpp)


imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   2

[TRT]  TensorRT version 4.0.2
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  loading network profile from engine cache... networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, networks/bvlc_googlenet.caffemodel loaded
[TRT]  device GPU, CUDA engine context initialized with 2 bindings
[TRT]  networks/bvlc_googlenet.caffemodel input  binding index:  0
[TRT]  networks/bvlc_googlenet.caffemodel input  dims (b=2 c=3 h=224 w=224) size=1204224
[cuda]  cudaAllocMapped 1204224 bytes, CPU 0x101540000 GPU 0x101540000
[TRT]  networks/bvlc_googlenet.caffemodel output 0 prob  binding index:  1
[TRT]  networks/bvlc_googlenet.caffemodel output 0 prob  dims (b=2 c=1000 h=1 w=1) size=8000
[cuda]  cudaAllocMapped 8000 bytes, CPU 0x101740000 GPU 0x101740000
device GPU, networks/bvlc_googlenet.caffemodel initialized.
[TRT]  networks/bvlc_googlenet.caffemodel loaded
imageNet -- loaded 1000 class info entries
networks/bvlc_googlenet.caffemodel initialized.
default X screen 0:   1920 x 1080
[OpenGL]  glDisplay display window initialized
[OpenGL]   creating 1280x720 texture
loaded image  fontmapA.png  (256 x 512)  2097152 bytes
[cuda]  cudaAllocMapped 2097152 bytes, CPU 0x101940000 GPU 0x101940000
[cuda]  cudaAllocMapped 8192 bytes, CPU 0x101742000 GPU 0x101742000
[gstreamer] gstreamer transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstCamera onEOS
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> videoconvert1
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> videoconvert0
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0
[gstreamer] gstreamer changed state from NULL to READY ==> v4l2src0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> videoconvert1
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> videoconvert0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> v4l2src0
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer msg stream-start ==> pipeline0
[gstreamer] gstreamer msg new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> videoconvert1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> videoconvert0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0
[gstreamer] gstreamer v4l2src0 ERROR Internal data flow error.
[gstreamer] gstreamer Debugging info: gstbasesrc.c(2948): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming task paused, reason not-negotiated (-4)
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> v4l2src0
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink

imagenet-camera:  camera open for streaming

imagenet-camera:  failed to capture frame
imagenet-camera:  failed to convert from NV12 to RGBA
[TRT]  imageNet::Classify( 0x(nil), 1280, 720 ) -> invalid parameters
[cuda]   cudaNormalizeRGBA((float4*)imgRGBA, make_float2(0.0f, 255.0f), (float4*)imgRGBA, make_float2(0.0f, 1.0f), camera->GetWidth(), camera->GetHeight())
[cuda]      invalid device pointer (error 17) (hex 0x11)
[cuda]      /home/nvidia/jetson-inference/imagenet-camera/imagenet-camera.cpp:198
[cuda]   registered 14745600 byte openGL texture for interop access (1280x720)

So it looks like it is trying to convert from NV12 anyway, despite opening an RGB / RGB stream under gstreamer. I have set the v4l2 index from -1 to 0 or 1, and recompiled, neither helps.

I am also not sure how to use the commands you mentioned above. In the terminal I can run say

gst-launch-1.0 v4l2src device=/dev/video1 ! video/x-raw, width=1920, height=1080, format=RGB ! videoconvert ! video/x-raw, format=RGB ! videoconvert ! xvimagesink

or even

gst-launch-1.0 v4l2src device=/dev/video1 ! video/x-raw, width=1920, height=1080, format=RGB ! videoconvert ! video/x-raw, format=NV12 ! videoconvert ! xvimagesink

And I can view the stream. Then I try with jetson-inference bit same errors.

I also tried

gst-launch-1.0 v4l2src device=/dev/video1 ! video/x-raw, width=1920, height=1080, format=RGB ! videoconvert ! video/x-raw, format=NV12
 ! videoconvert ! appsink name=mysink

as appsink name=mysink is the sink named in the error message. That did not work either. Should running these in the terminal have any effect on running imagenet-camera?

Additionally I have looked into substituting a function from

cudaYUV.h

for

convertRGBA()

in imagenet-camera.cpp, but again am running into the limit of my coding ability here. I changed

if( !camera->ConvertRGBA(imgCUDA, &imgRGBA) )
		printf("imagenet-camera:  failed to convert from NV12 to RGBA\n");

to

if( !cudaYUYVToRGBA(imgCUDA, &imgRGBA, camera->GetWidth(), camera->GetHeight())
	printf("imagenet-camera:  failed to convert from YUYV to RGBA\n");

but of course the types of the arguments are different, with convertRGBA requiring

ConvertRGBA( void* input, void** output, bool zeroCopy )

and cudaYUV.h’s function requiring:

cudaError_t cudaYUYVToRGBA( uchar2* input, uchar4* output, size_t width, size_t height )

So I get errors relating to type conversions. I know enough to realise that cudaYUYVToRGBA is not a member function of the camera object but I am not sure if that is the correct way to handle it or not, just #include ing the header file and calling the function?

Sorry I know this is a big post and a lot to ask, I am drowning a bit and could really use some detailed help! :)

Thank you

That is a spurious message about NV12, from the log it is running V4L2 RGB mode. As seen on this line of code from imagenet-camera, the program always prints NV12 when that ConvertRGBA() function fails (my bad - NV12 is the format used by the onboard MIPI CSI camera, not V4L2 format).

Further, ConvertRGBA() is failing likely because it isn’t getting proper data from V4L2, as these messages below are also in the log, indicating an internal problem within the GStreamer V4L2 interface:

[gstreamer] gstreamer v4l2src0 ERROR Internal data flow error.
[gstreamer] gstreamer Debugging info: gstbasesrc.c(2948): gst_base_src_loop (): /GstPipeline:pipeline0/GstV4l2Src:v4l2src0:
streaming task paused, reason not-negotiated (-4)

Since the pipelines you launched from the command line work to view the camera, and those are with resolution 1920x1080 in RGB format, you may want to try creating the gstCamera class in imagenet_camera.cpp with resolution 1920x1080 as well. By default, gstCamera class opens it with 1280x720 resolution, which maybe the Kinect driver doesn’t like that. You could try changing imagenet-camera.cpp:73 to the below:

gstCamera* camera = gstCamera::Create(1920, 1080, DEFAULT_CAMERA);

I think the V4L2 index (DEFAULT_CAMERA) should be set to 1 (which maps to /dev/video1), as the onboard MIPI CSI camera should be /dev/video0. You should be able to determine which camera is which using the v4l2-ctl utility to query the device info:

$ sudo apt-get install v4l-utils
$ v4l2-ctl --list-devices
$ v4l2-ctl --info
$ v4l2-ctl --all

And well, if your test GStreamer pipline that you are manually launching from the command line is showing the Kinect video on /dev/video1, then the right DEFAULT_CAMERA index for your case is 1. v4l2-ctl is a nifty utility to have though.

If you are unable to resolve the issue, you may want to try my suggestion with the ROS topic here.