GStreamer 1.0 performance issue on TK1

I’ve write a program that use gstreamer-1.0 to perform transcoding on TK1, it produce RGB16
raw data format image of size 385 * 288 per 1/30 sec. concurrently for 60 pipelines (Each
pipeline would generate about 30 frames per second).

There’re two test cases of my program, The organizations of pipeline are shown as below:
Case 1: appsrc ! videoconvert ! jpegenc ! appsink
Case 2: appsrc ! videoconvert ! nvjpegenc ! appsink

The CPU utilization and FPS for 60 pipelines:
Case 1 Case 2
CPU 98%-100% 45%-55%
FPS 29-30 14-15

Here’re my questions:

  1. For case 2, does it actually dispatch the process of encoding to GPU?
    so the resource of CPU could relinquish to other processes.
  2. Why did I use the ‘nvjpegenc’ will cause the degraded FPS?
  3. How to determine the GPU utilization on TK1 for specified process?
    (I’ve tried to use the ‘nvidia-smi’, it doesn’t seems to be supported for TK1)
  4. If my program use ‘omxh264dec’ and ‘omxvp8enc’ to transcode H.264 encoded .mp4 file
    to another VP8 encoded .mp4 file, would it utilize the GPU resources?

nvjpegenc should use HW for jpeg encoding but I haven’t tested that myself. omxh264dec and omxvp8enc will use HW for decoding and encoding.

Have you maximised cpu, gpu, and emc clocks:
http://elinux.org/Jetson/Performance

In the pipelines above videoconvert is a SW operation and thus very slow. You may try nvvidconv instead but I’m not sure if that supports rgb16 (rgb565).

Even if you would be able to do everything as HW accelerated, 60 streams of 385x288 @ 30 FPS might be too much.

Now I’ve removed the color space converter, just make the encoder (nvjpegenc) consume RGB (24 bits) raw data from ‘appsrc’, but its performance of FPS didn’t improved.

"The CPU utilization and FPS for 60 pipelines:
Case 1 Case 2
CPU 98%-100% 45%-55%
FPS 29-30 14-15
=> could you experiment using lower lodaing and see the outcome, say 10 pipelines instead of 60?

Also, if you look at Jetson TK1 TRM, 13.3.9 Pixel Format Support, 24-bit color format is not directly dupported so you should use 32-bit color format to replace 24-bit counterpart having one 8-bit alpha and try that again. So 2 experiments. Yes, when you use nvjpegenc lib, it will use JPEG encoder HW.

Hello.

I already asked this question in the subject https://devtalk.nvidia.com/default/topic/819963/?offset=9#4589012, but duplicate it here

A similar problem with encode jpeg.
The command
for i in seq 1 1000; do
gst-launch-0.10 filesrc location=./img_2592x1944_pitch2592 blocksize=5038848 !
“video/x-raw-gray, bpp=8, width=(int)2592, height=(int)1944, framerate=(fraction)1/1, format=(fourcc)I420” !
jpegenc !
fakesink -e
done
It is performed for 1m24s seconds. The command
for i in seq 1 1000; do
gst-launch-0.10 filesrc location=./img_2592x1944_pitch2592 blocksize=5038848 !
“video/x-raw-gray, bpp=8, width=(int)2592, height=(int)1944, framerate=(fraction)1/1, format=(fourcc)I420” !
nvjpegenc !
fakesink -e
done
It is performed for 2m45s seconds.

It is clear that for some time it takes to load the file. But still the standard is faster encode nvidia encode.

Why this happens?

http://elinux.org/Jetson/Performance I read and execute.

NVIDIA support advised a command run:
gst-launch-1.0 videotestsrc num-buffers=1000 ! ‘video/x-raw, width=(int)2592, height=(int)1944, framerate=(fraction)1/1, format=(string)GRAY8’ ! queue ! nvvidconv ! “video/x-raw(memory:NVMM)” ! queue ! nvjpegenc ! fakesink -v -e