NvDrmRenderer setPlane fails with smart TV connected to HDMI

Hi,

We have a product that is based on TX1, decoding video and displaying it on the HDMI. We are using L4T 32.7.4 (which will be very difficult to update), and some customers have noted that they have green video output on the HDMI connection on certain monitors (mainly smart TVs rather than regular monitors). We have reproduced this locally on a Samsung smart TV based on their recommendation, so I am able to debug it locally.

Inside NvDrmRenderer::setPlane, the call to drmModeSetPlane is failing with error -EINVAL (-22). The NVIDIA libdrm_nvdc.so library is being used and this error seems to be coming directly from that library rather than the kernel, as I instrumented the kernel drm code and it isn’t being hit.

The customer has tried every single display mode advertised by their test TV, and none of them worked, so I don’t think it is to do with the mode. I disassembled libdrm_nvdc.so and it looks quite a bit more complicated than the open source libdrm, so I am not able to debug it further. If you could please help me debug it or provide code to libdrm_nvdc.so that would be very helpful.

Thank you,

Chris Richardson

Hi,
There is a known issue:

Jetson/L4T/r32.7.x patches - eLinux.org
[MMAPI] NvDrmRenderer broken

Not sure if it helps but you may apply it for a try.

Hi DaneLLL,

Thanks for the response and info. I have seen that post and I did try it, but it didn’t help. The issue I am seeing is that the initial setPlane call during display initialization fails, and that setPlane call happens later once we are rendering. I tried many combinations of parameters to the initial setPlane call and none of them work.

I also had the customer run some tests we have in a special test mode, which include running the bare 08_video_dec_drm sample with a test video and that doesn’t work either. We normally have X completely disabled and not even present on the system, but the test mode also includes X and we are able to display the GL “particles” demo on the monitor, so the display definitely works.

It’s something with libdrm_nvdc which seems to cause this issue, and it’s during the initial call to drmModeSetPlane. Do you know if there have been fixes to that library in a newer version? I could try to just use the newer version of only that library, or a patched library.

Thanks,

Chris Richardson

I don’t have the knowledge to really help on this, but since this is related to DRM, I am curious if there is any device between the Jetson and the TV, e.g., VCR, DVD, KVM, etc.? If so, try without that between; if not, then if something is available to add into the chain, then try adding it. Just see if anything changes because of that.

I am also curious, what does your device do? Is it playing video content and not just acting as a computer? If it does this, does it always do this with that monitor even in non-media display (e.g., a terminal which isn’t rendered as media)?

Hey linuxdev,

Thanks for the response and questions. There are no devices between the TX1 and the TV, though we have seen some issues with that in the past with certain device and display mode combinations. The issue isn’t even the physical display or the negotiation between the TX1 and the TV, since it does display correctly when using X.

It seems to be just this one software component from NVIDIA that replaces the open source libdrm with their custom non-open source version. That library is returning me -EINVAL for some reason and I can’t guess it from the disassembly. Whatever happens after that, even when we close down the NvDrmRenderer, something seems glitchy or not torn down correctly because the screen stays green until we disconnect the cable or reboot the device.

The device itself does transcoding and mixing of UDP/RTP/RTSP streams and it’s a pretty simple device as far as that goes. It doesn’t act as a general purpose computer.

Thanks,

Chris

Does this occur only when using DRM content? If this works on X, then the GPU driver itself seems to work. Whatever your method is, it seems maybe it is what you mention with a library (in this case libdrm_nvdc). The problem there is that some of that is designed to “not work” if the content is thought to be DRM and the hardware chain does not meet requirements. However, if, when using X, any of the DRM content plays (with DRM enforcement), it means the hardware chain is likely ok. This is only valid if X is enforcing DRM (and I don’t know how to check that).

It was mentioned that this is a known issue (though I just now heard about it), but if the source code for the library running drmModeSetPlane() is available, then you could recompile it with debug symbols and set a break point at that function or otherwise get a stack frame at the point of failing to know what arguments are being passed and at what point it is failing. This would allow an actual source code fix, or validate that something upstream was sending the wrong arguments. This is perhaps the most likely method of fixing it since you get a failure or exception thrown there.

Note that a library is where you’d find the specific content within the failing function, but there is no reason you couldn’t compile your program with debug symbols and get a stack frame from that; it wouldn’t tell you what went wrong in the library, but it might tell you what data is going in and you could compare it to what you think should be going into the call.

I do not know if the source code for the library is available, but often user space libraries do have available source code.

Hey linuxdev,

Thanks for the information and your answers. I’m sorry I should have been clear, DRM in this case stands for direct rendering manager rather than digital rights management, and it’s the method used by userspace to talk to kernel drivers which implement the “direct rendering manager” interface. It’s just used for rendering 2D content to the display output and libdrm is the Linux “standard” library used so you don’t have to send ioctls to the kernel driver yourself. NVIDIA has a custom implementation of the libdrm “standard” library which we don’t have source code to. This library has some sort of more complex implementation that is returning -EINVAL. The standard library does not return that value from the function being called.

I did a bunch of testing today and discovered it has something to do with the HDR (high dynamic range) parameters being sent to the driver (or to NVIDIA’s custom libdrm_nvdc library).

DaneLLL, when I disable the call to setHDRMetadataSmpte2086 in the NvDrmRenderer constructor the display output works. It looks like the main difference here is that the newer/smart displays support HDR data. Or at least that seems to be the case with the smart TV I’m testing with. I will have our customer run a test as well and see if it fixes their issue. Do you know any reason off the top of your head why HDR being enabled would cause issues in libdrm_nvdc?

Thanks,

Chris Richardson

My fault, I don’t know that code well enough. The idea of recompiling with debug symbols is still the way to go if it is possible. Your own application could have debug symbols and you could still get a stack frame and see what the arguments were, but it wouldn’t tell you where in that function the actual failure is. If the library itself could be recompiled with debug symbols, then you could go to the specific condition causing this (the debugger could be told to step into the library and point out the specific line in the specific function). @DaneLLL is there source code for that library which could be recompiled with debug symbols?

Btw, you mentioned the efficiency by using the user space library and avoiding the ioctls. This reminded me, there is an strace utility for monitoring system calls, which you won’t care about (unless it is the library calling an ioctl and getting a failure returned from that), but there is a user space equivalent for library calls: ltrace. I don’t think there is anything you could get which is different for ltrace versus debug symbols, but sometimes one library will call another, and then ltrace can reveal that. For example, if your library call happens to call yet another library, then you could see that. Sadly, you would still want to add debug symbols to that final library if that were the case. I think having the library recompiled with debug symbols would do wonders to how fast you find your answer.

Hi,
The public source code is nvdrmvideosink plugin and 08_video_dec_drm/. Low level prebuilit libs are not open source. If disabling the call to setHDRMetadataSmpte2086() works, would suggest use this as a solution.

Hi DaneLLL,

Thanks for the info, but I’m curious if you guys would even have a plan to fix whatever underlying issue might be here.? I understand it probably won’t get ported to 32.7.x but it’d be nice if we had a proper workaround (even just a set of always-valid HDR parameters that wouldn’t cause drmModeSetPlane to fail), rather than just disabling the HDR feature altogether.

Thanks,

Chris Richardson

Hi,
Since we will not have further update to Jetpack 4:
Announcing End of Life for NVIDIA JetPack™ 4 with the release of JetPack 4.6.6

Would suggest disabling the call to setHDRMetadataSmpte2086() as a solution.

I’ll make another last suggestion. Do you have the EDID data from that monitor? It doesn’t even have to be from the monitor running on the Jetson. If you have that, then you can see details by putting the EDID in here:
https://www.edidreader.com/

If you then have a method to know which mode is failing, you can compare it to those details from edidreader.com. It might point out something unique about that mode.

You might still be able to find the arguments passed to the library via ltrace even though you wouldn’t find a specific line of code in the library.

Hey linuxdev,

Thanks for the additional suggestions and info. I do regularly use strace and saw in this case there are no system calls happening during the failure. I didn’t mean to imply using libdrm is more efficient than using ioctls, just that it is a wrapper around having to do some of that work to talk to the display.

Thanks for the tips regarding ltrace and edidreader. We tried all the modes advertised by the monitor and I guess they all must specify some HDR flags/metadata that NVIDIA doesn’t like. I hadn’t heard of ltrace but will check it out to see if I can get a better solution here than simply not supporting HDR.

Thanks,

Chris Richardson

If you put the EDID into the URL listed, it should tell you about any HDR flags. Any difference might mean the programming itself is adding that (technically I could see the possibility of reading the EDID, and then editing EDID specs prior to use; I don’t know of any case where that is done, but it is a possibility).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.