MPEG2 hardware decoding failure on Jetson TK1

Hi everyone. I’ve got my Jetson and already burned a fresh Ubuntu 14.04 system on it with and 21.4. I have several questions related to video and/or GPU:

  1. I use VLC under a regular user to play MEPG2 content. If the user is not a member of the “video” group, I see an error message “**** NvRmMemInit failed ****” in console, but VLC is able to play the video (on CPU I guess). If the user is a member of the “video” group, VLC tries to open /dev/nvhost-ctrl and play the video using MPEG2 hardware acceleration, but I see the very first frame of the video and nothing more. The progress bar is moving, but there is no video content visible in the VLC window, just that first static frame. Console messages:
# sudo -H -u luna vlc -I dummy /data/mpeg2.mp4
VLC media player 2.1.6 Rincewind (revision 2.1.6-0-gea01d28)
shm_open() failed: Permission denied
[0xbd6e8] dummy interface: using the dummy interface module...
Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit set DPB and MjstreamingNvMMLiteOpen : Block : BlockType = 267 
TVMR: NvMMLiteTVMRDecBlockOpen: 4937: NvMMLiteBlockOpen 
NvMMLiteBlockCreate : Block : BlockType = 267 
TVMR: cbBeginSequence: 571: BeginSequence  1280x720, bVPR = 0
TVMR: cbBeginSequence: 813: DecodeBuffers = 3 
TVMR: cbBeginSequence: 833: Display Resolution : (1280x720) 
TVMR: cbBeginSequence: 834: Display Aspect Ratio : (1280x720) 
TVMR: cbBeginSequence: 998: SurfaceLayout = 3
TVMR: cbBeginSequence: 1028: NumOfSurfaces = 7, InteraceStream = 0, InterlaceEnabled = 0, bSecure = 0, MVC = 0 Semiplanar = 1, bReinit = 1 
Allocating new output: 1280x720 (x 9), ThumbnailMode = 0
Fontconfig warning: FcPattern object size does not accept value "0"
Fontconfig warning: FcPattern object size does not accept value "0"
Fontconfig warning: FcPattern object size does not accept value "0"
Fontconfig warning: FcPattern object size does not accept value "0"
[0xf6db8] main vout display error: Failed to resize display

If I add “-v” option, I see tons of “picture is too late to be displayed (missing XXX ms.)”

And even worse: if I use libvlc and develop my own player based on it, and run it under root (the requirement), the player permanently fails to play the video content since root doesn’t need to be in the “video” group.

So the question is why the hardware decoding fails? How to tell VLC not to use the nVidia hardware decoding WITHOUT removing the normal user from the “video” group, or when running my own player under root?

  1. glxinfo says “Direct Rendering: Yes”. If I run glxgears I see the FPS value is about 2000 FPS. Is that an expected result with Jetson?

I don’t know how to “fix” the issue, but FYI the membership in group video allows CUDA GPU access (it shouldn’t be required for OpenGL ES to display without video group membership). It’s a choice of the software being used as to how to deal with lack of access to GPU for CUDA, e.g., fall back to CPU or bail out in error.

It’s possible that your CUDA install is not complete. To check basic hardware access libraries you can run this:

sha1sum -c /etc/nv_tegra_release

I’m not sure if there is a similar command or check for validating if CUDA is/isn’t also installed completely and with correct permissions. Anyone have a fast method to validate CUDA install?

If you’ve installed the CUDA samples, you should be able to run them, they are typically installed in the home folder of the “ubuntu” user.

It is not required for root. Under root a video application (VLC for example) will always try to use a hardware decoding. For me the hardware decoding doesn’t work and I’d like to disable it. But don’t know how.

“OK” status for everything.

One thing I see which might be useful (or might not) was trying to run vlc directly as root on Jetson TK1 R21.4, after sudo -s:

VLC is not supposed to be run as root. Sorry.
If you need to use real-time priorities and/or privileged TCP ports
you can use vlc-wrapper (make sure it is Set-UID root and
cannot be run by non-trusted users first).

I’m thinking that perhaps the issue is unrelated to hardware acceleration, or maybe error messages are simply uninformative with your current command line. FYI, I’m able to use root with vlc-wrapper without issue, it’s just direct vlc invocation that seems to be an issue.

Under root I execute my own simple video player built on top of libvlc.

Do you see the “Inside NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit” message when you run vlc-wrapper (for example with this Hubble video )?

What I observe may help, I don’t consider it an answer. First, vlc-wrapper does not seem to work on that particular mpg file (I have others I have not tested with yet) when not run as root. Run as root it works, non-root is just black. When run as root I do not see the NvxLiteH264DecoderLowLatencyInitNvxLiteH264DecoderLowLatencyInit message, but when the screen is black (non-root) I do indeed see this.

Related to this, while running vlc I saw another message that if needing “real-time priorities and/or…” that vlc-wrapper should be used, and also that this wrapper should be given Set-UID root…apparently there is some concern for security for whatever reason. In any case running vlc-wrapper as root works, running vlc directly as root does not seem to be allowed, and running vlc-wrapper as non-root has issues. On a fedora system my vlc-wrapper is NOT SUID root (no CUDA), and still works…

It looks like vlc was intended to be run under root authority but only via the wrapper. How much of this is embedded in the libraries you are using I don’t know…but it seems that understanding how vlc and vlc-wrapper use root and deal with this could be what’s needed to solve the issue. You might need to understand vlc’s philosophy on security and the wrapper before your app will work correctly. It looks like MPEG2 decoding functions correctly, but has some hoops to jump through for security.

I’ve done more debugging and found that there are two possible ways to fix the issue:

  1. increase the file buffer size with --file-caching XXX, where XXX is >= 1000 (in milliseconds). In this case VLC works fine even under a regular user
  2. Remove OpenMAX plugin (/usr/lib/vlc/plugins/codec/ that uses Jetson hardware decoding.

For some reason the CPU usage with the first fix is absolutely the same as with the second one. This means that with the first fix VLC tries to use the hardware decoding but fails, or uses the hardware decoding only for some specific task like deinterlacing. I believe when the hardware decoding is used, the CPU usage should not be high. But now with both fixes it is about 100-200%.

Anyways, I consider the issue to be fixed. However there is a question why the CPU usage is so high even when the hardware decoding is enabled.

I imagine the display pipeline is complicated…lots of guesses could be made, but using a profiler would take away guessing on bottlenecks and CPU use. Since the message I saw was related to needing “real-time priorities” I suspect root authority is needed as a means of increasing priority (versus increasing buffer instead). CPU use itself would be related to whether data is available…time spent working on data versus waiting for data could have a fairly strong influence on CPU use under any circumstance.

Nope, once again, my own player built on top of libvlc is running under root and has the same issue as running VLC under a regular user.

Although the vlc and vlc-wrapper mention the need for root authority in order to use real-time priorities, is the actual behavior built into libvlc and simply noted on the command line via vlc and vlc-wrapper? It really seems that making your own application with libvlc supporting it would not get around any real-time requirements or other requirements which are integrated within libvlc (I’m assuming the vlc front end is simply noting the issue but not causing it). On my Jetson root worked for me on the vlc-wrapper, while non-root didn’t (with vlc-wrapper NOT SUID root).

This makes me curious…what happens if you run your own application as root reniced to a -1 (increased) priority? Things like blank or missing data could be related to something like a priority inversion if the main process were designed for and expected to start with increased priority. For reference, the man page to “nice”:

Nope, it doesn’t help neither with -1 nor with -10.

It looks like the OpenMAX plugin is a bottleneck. It tries to use the hardware decoder but for some reason works very slow with it, and the decoded frames are discarded by VLC since they have arrived too late to be displayed. This is why VLC prints “picture is too late to be displayed (missing XXX ms.)” and doesn’t display anything. I am just wondering that the file cache size set to 1000ms helps because VLC starts to read few frames at a time and decode them simultaneously. In this case the decoded frames are not late and VLC is able to display them.

I’m kind of fishing for clues with this, but if before you run your app you switch to group video (“newgrp video”), does this change anything (you may want to try this with nice -1 as well)?

I don’t think newgrp will change anything since there are no access restrictions for root. I also tried to run my player with real-time priority set to 5, 10, 20 or 40 (chrt --rr 10 ) with no luck. I still suspect there is a bug in OpenMAX plugin in VLC.