the cpu usage cannot down (use cuda decode)

Hi SunYe,

Any update? Could you share the result and progress?

Thanks

project code cann’t share, because the corporation forbid.
the key tx1 decoder code I have share at 1 floor.

the result:
1、H264@10M (frameRate:25)
2 channel H264@10M ffmpeg decode drop frame,cpu 30%(include rtsp)
1 channel H264@10M ffmpeg decode is OK,cpu 25%(include rtsp)

2 channel H264@10M cuda decode ok, cpu 32%(include rtsp)
3 channel H264@10M cuda decode drop frame(include rtsp)

2、H265@10M (frameRate:25)
1 channel H265@10M ffmpeg decode drop frame,cpu 25%(private protocal get stream)
2 channel H264@10M cuda decode ok, cpu 30%, (private protocal get stream)
3 channel H264@10M cuda decode drop frame

cuda decode only can decode h264/h265@10M at 2 channel, cann’t more

project code cann’t share, because the corporation forbid.
the key tx1 decoder code I have share at 1 floor.

the result:
1、H264@10M (frameRate:25)
2 channel H264@10M ffmpeg decode drop frame,cpu 30%(include rtsp)
1 channel H264@10M ffmpeg decode is OK,cpu 25%(include rtsp)

2 channel H264@10M cuda decode ok, cpu 32%(include rtsp)
3 channel H264@10M cuda decode drop frame(include rtsp)

2、H265@10M (frameRate:25)
1 channel H265@10M ffmpeg decode drop frame,cpu 25%(private protocal get stream)
2 channel H265@10M cuda decode ok, cpu 30%, (private protocal get stream)
3 channel H265@10M cuda decode drop frame

cuda decode only can decode h264/h265@10M at 2 channel, cann’t more

We have verified four 1080p25 transcoding on TX1. It should also work for using MM APIs.
[url]https://devtalk.nvidia.com/default/topic/979908/jetson-tx1/gstreamer-transcoding-performance-issue/post/5033461/#5033461[/url]

And please let me emphasize again that it is not ‘cuda decode’. The HW decoder on TX1/TX2 is individual HW engine, not GPU.

My Scene is Realtime Recevie Video Data And Decode, Not Transcoding.

If the average decode time cann’t below 1000/25= 40ms, the h264/h265 data will overflow,

if I put more than 2 channel h264/h265 data (10Mbit Data per second) into tx1 decoder ,

the tx1 decode engineer cann’t decode quick enough, so the h264/h265 data will overflow.

And please let me emphasize again that: our Scene is Realtime Decode, not offline Decode,

so the performance evaluate method is different. and the result is also different.

you can use live555 to establish a rtsp service , and use cmd :
gst-launch-1.0 rtspsrc location=“rtsp://192.168.110.232/3.mkv” ! rtph264depay ! h264parse ! omxh264dec ! nvoverlaysink -e
to test the realtime decode performance.

the offline decode performance and realtime decode performance is different.

I doesn’t use the cmd , I use my rtsp program to test.

and I doesn’t use live555 rtsp service ,I use ip camera for rtsp service.

the result is : tx1 realtime decode performance is 2 channel h264/h265 decode @ 10Mbit , 25framerate, cann’t more

We have verified four 1080p25 @ 7.5Mbit streaming playback on two TX1-r24.2.1. One TX1 is as server and the other is as client.

Test video file: bourne_ultimatum_trailer.zip - Download The Bourne Ultimatum - High Definition (1080p) Theatrical Trailer - dvdloc8.com

Server
Compile gst-rtsp-server/test-mp4.c at master · GStreamer/gst-rtsp-server · GitHub
Start rtspserver

$ ./test-mp4 Bourne_Trailer.mp4

Client

$ export RTSP_PATH=rtsp://10.19.106.151:8554/test
$ gst-launch-1.0 rtspsrc location="$RTSP_PATH" ! rtph264depay ! h264parse ! omxh264dec ! nveglglessink window-x=100 window-y=100 window-width=640 window-height=360 & gst-launch-1.0 rtspsrc location="$RTSP_PATH" ! rtph264depay ! h264parse ! omxh264dec ! nveglglessink window-x=800 window-y=100 window-width=640 window-height=360 & gst-launch-1.0 rtspsrc location="$RTSP_PATH" ! rtph264depay ! h264parse ! omxh264dec ! nveglglessink window-x=100 window-y=500 window-width=640 window-height=360  & gst-launch-1.0 rtspsrc location="$RTSP_PATH" ! rtph264depay ! h264parse ! omxh264dec ! nveglglessink window-x=800 window-y=500 window-width=640 window-height=360

The result is identical to offline decoding.

please test h264 @ 10Mbit, and h265 @ 10Mbit
And I suggest use a virtual machine linux for rtsp service
(Use task management graphics interface to see the bitstream is 10Mb or not)

and you should record the video data to see if the video is drop frame or not.

my Scene is, I receive rtsp and store video data in list, and put it into tx1 decode.
if more than 2 channel, the list store video is overflow(more than 100 frame will overflow), the video list I max store 100 frame.

Hi sunYe,
We also verified four offline decoding:

ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ date
Fri Jul 21 03:10:01 UTC 2017
ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 100 -wy 100 -ww 640 -wh 360 -fps 25 & ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 100 -wy 600 -ww 640 -wh 360 -fps 25 & ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 800 -wy 100 -ww 640 -wh 360 -fps 25 & ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 800 -wy 600 -ww 640 -wh 360 -fps 25 &
[1] 8235
[2] 8236
[3] 8237
[4] 8238
ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ Failed to query video capabilities: Bad address
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7580: NvMMLiteBlockOpen
Failed to query video capabilities: Bad address
Failed to query video capabilities: Bad address
NvMMLiteBlockCreate : Block : BlockType = 261
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7580: NvMMLiteBlockOpen
Failed to query video capabilities: Bad address
Failed to query video capabilities: Bad address
NvMMLiteOpen : Block : BlockType = 261
TVMR: NvMMLiteTVMRDecBlockOpen: 7580: NvMMLiteBlockOpen
NvMMLiteBlockCreate : Block : BlockType = 261
NvMMLiteBlockCreate : Block : BlockType = 261
Failed to query video capabilities: Bad address
NvMMLiteOpen : Block : BlockType = 261
Starting decoder capture loop thread
TVMR: NvMMLiteTVMRDecBlockOpen: 7580: NvMMLiteBlockOpen
Failed to query video capabilities: Bad address
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Failed to query video capabilities: Bad address
TVMR: cbBeginSequence: 1166: BeginSequence  1920x816, bVPR = 0, fFrameRate = 23.975986
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1545: DecodeBuffers = 2, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1606: Display Resolution : (1920x816)
TVMR: cbBeginSequence: 1607: Display Aspect Ratio : (1920x816)
TVMR: cbBeginSequence: 1649: ColorFormat : 5
TVMR: cbBeginSequence:1660 ColorSpace = NvColorSpace_YCbCr709
TVMR: cbBeginSequence: 1790: SurfaceLayout = 3
TVMR: cbBeginSequence: 1868: NumOfSurfaces = 3, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
TVMR: cbBeginSequence: 1166: BeginSequence  1920x816, bVPR = 0, fFrameRate = 23.975986
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1545: DecodeBuffers = 2, pnvsi->eCodec = 4, codec = 0
Video Resolution: 1920x816
TVMR: cbBeginSequence: 1606: Display Resolution : (1920x816)
TVMR: cbBeginSequence: 1607: Display Aspect Ratio : (1920x816)
TVMR: cbBeginSequence: 1649: ColorFormat : 5
TVMR: cbBeginSequence:1660 ColorSpace = NvColorSpace_YCbCr709
TVMR: cbBeginSequence: 1790: SurfaceLayout = 3
TVMR: cbBeginSequence: 1868: NumOfSurfaces = 3, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
TVMR: cbBeginSequence: 1166: BeginSequence  1920x816, bVPR = 0, fFrameRate = 23.975986
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1545: DecodeBuffers = 2, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1606: Display Resolution : (1920x816)
TVMR: cbBeginSequence: 1607: Display Aspect Ratio : (1920x816)
TVMR: cbBeginSequence: 1649: ColorFormat : 5
TVMR: cbBeginSequence:1660 ColorSpace = NvColorSpace_YCbCr709
TVMR: cbBeginSequence: 1790: SurfaceLayout = 3
TVMR: cbBeginSequence: 1868: NumOfSurfaces = 3, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
Video Resolution: 1920x816
Starting decoder capture loop thread
Video Resolution: 1920x816
TVMR: cbBeginSequence: 1166: BeginSequence  1920x816, bVPR = 0, fFrameRate = 23.975986
TVMR: LowCorner Frequency = 0
TVMR: cbBeginSequence: 1545: DecodeBuffers = 2, pnvsi->eCodec = 4, codec = 0
TVMR: cbBeginSequence: 1606: Display Resolution : (1920x816)
TVMR: cbBeginSequence: 1607: Display Aspect Ratio : (1920x816)
TVMR: cbBeginSequence: 1649: ColorFormat : 5
TVMR: cbBeginSequence:1660 ColorSpace = NvColorSpace_YCbCr709
TVMR: cbBeginSequence: 1790: SurfaceLayout = 3
TVMR: cbBeginSequence: 1868: NumOfSurfaces = 3, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1, BitDepthForSurface = 8 LumaBitDepth = 8, ChromaBitDepth = 8, ChromaFormat = 5
Starting decoder capture loop thread
Video Resolution: 1920x816
libv4l2_nvvidconv (0):(765) (INFO) : Allocating (8) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(775) (INFO) : Allocating (8) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
libv4l2_nvvidconv (0):(765) (INFO) : Allocating (8) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(775) (INFO) : Allocating (8) CAPTURE PLANE BUFFERS Layout=0
libv4l2_nvvidconv (0):(765) (INFO) : Allocating (8) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(775) (INFO) : Allocating (8) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
Query and set capture successful
libv4l2_nvvidconv (0):(765) (INFO) : Allocating (8) OUTPUT PLANE BUFFERS Layout=1
libv4l2_nvvidconv (0):(775) (INFO) : Allocating (8) CAPTURE PLANE BUFFERS Layout=0
Query and set capture successful
TVMR: FrameRate = 23.975986

(skip...)

TVMR: NvMMLiteTVMRDecDoWork: 6466: NVMMLITE_TVMR: EOS detected
Input file read complete
TVMR: NvMMLiteTVMRDecDoWork: 6466: NVMMLITE_TVMR: EOS detected
Input file read complete
TVMR: NvMMLiteTVMRDecDoWork: 6466: NVMMLITE_TVMR: EOS detected
Input file read complete
TVMR: NvMMLiteTVMRDecDoWork: 6466: NVMMLITE_TVMR: EOS detected
Input file read complete
TVMR: FrameRate = 23.975986

(skip...)

TVMR: TVMRBufferProcessing: 5444: Processing of EOS
TVMR: TVMRBufferProcessing: 5519: Processing of EOS Done
Exiting decoder capture loop thread
TVMR: TVMRBufferProcessing: 5444: Processing of EOS
TVMR: TVMRBufferProcessing: 5444: Processing of EOS
TVMR: TVMRBufferProcessing: 5444: Processing of EOS
TVMR: TVMRBufferProcessing: 5519: Processing of EOS Done
Exiting decoder capture loop thread
TVMR: TVMRBufferProcessing: 5519: Processing of EOS Done
Exiting decoder capture loop thread
TVMR: TVMRBufferProcessing: 5519: Processing of EOS Done
Exiting decoder capture loop thread
TVMR: TVMRFrameStatusReporting: 6067: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5885: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5917: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7740: Done
App run was successful
TVMR: TVMRFrameStatusReporting: 6067: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5885: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5917: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7740: Done
TVMR: TVMRFrameStatusReporting: 6067: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5885: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5917: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7740: Done
App run was successful
TVMR: TVMRFrameStatusReporting: 6067: Closing TVMR Frame Status Thread -------------
TVMR: TVMRVPRFloorSizeSettingThread: 5885: Closing TVMRVPRFloorSizeSettingThread -------------
TVMR: TVMRFrameDelivery: 5917: Closing TVMR Frame Delivery Thread -------------
TVMR: NvMMLiteTVMRDecBlockClose: 7740: Done
App run was successful
App run was successful

[1]   Done                    ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 100 -wy 100 -ww 640 -wh 360 -fps 25
[2]   Done                    ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 100 -wy 600 -ww 640 -wh 360 -fps 25
[3]-  Done                    ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 800 -wy 100 -ww 640 -wh 360 -fps 25
[4]+  Done                    ./video_dec_cuda ~/Bourne_Trailer.h264 H264 -wx 800 -wy 600 -ww 640 -wh 360 -fps 25
ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ date      Fri Jul 21 03:11:32 UTC 2017

ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ date Fri Jul 21 03:10:01 UTC 2017
ubuntu@tegra-ubuntu:~/tegra_multimedia_api/samples/02_video_dec_cuda$ date Fri Jul 21 03:11:32 UTC 2017

But I think the test result differs case by case. One factor of making difference between offline decoding and online streaming may be network bandwidth, but probably other users can share more experience.

Hi DaneLLL
I want to use tera_multimedia_api/sample/02_video_dec_cuda to get one frame picture.can you tell me how to do this. and how to use 02_video_dec_cuda to write a file.like this “./video_dec_cuda …/…/data/Video/321.h264 H264 -o ./”?

Please run ‘./video_dec_cuda --help’ to get usage in detail

Hi DaneLLL
-o Write to output file ,but can you tell me what the type file can be written right,the commonds like this “/video_dec_cuda …/…/data/Video/321.h264 H264 -o ./test.mp4”, the file is wrong ,i can’t read the file correctly.

Hi DaneLLL
-o Write to output file ,but can you tell me what the type file can be written right,the commonds like this “/video_dec_cuda …/…/data/Video/321.h264 H264 -o ./test.mp4”, the file is wrong ,i can’t read the file correctly.

The output file is YUVs. You have to open via YUVviewer.

Hi,DaneLLL
I found 02_video_cuda_dec can only decode h264/h265 files; now i want to decode rtsp stream form an ipcamera realtime. I don,t know how to recieve rtsp stream with 02video_cuda_dec,you can give me some advice?

tegra_multimedia_api supports low-level h264/h265 decoding. For rtsp, you have to implement yourself.

You also can leverage gstreamer frameworks.