Problems with DSI output moving for L4T R 28.1 to L4T R32.1

cobrien · December 30, 2019, 8:16pm

We have a carrier board that drives a third HDMI output by sending the DSI
output to a DSI to HDMI converter chip. We were able to configure and run
the third HDMI using L4T R28.1/kernel 4.4. We are trying to upgrade to
L4T 42.1/kernel 4.9. I was able to adjust the device tree so the frame
buffer is created, but the frame buffer can’t be written to, and the X Server
won’t start. Note that the two HDMI outputs are working properly (thanks to
helpful information here).

In more detail.

I copied the working dsi device-tree configuration sections from the previous
version. The device init recognizes this, and there don’t seem to be any errors
reading the device tree. Note I enabled of-debug in nvidia/drivers/video/tegra/dc/of_dc.c
The kernel reports, at startup:

[    0.473159] iommu: Adding device 15220000.nvdisplay to group 34
[    0.473320] platform 15220000.nvdisplay: OF IOVA linear map 0x9607b000 size (0x800000)
[    0.473431] platform 15220000.nvdisplay: OF IOVA linear map 0x96078000 size (0x2008)
[    0.949491] OF_DC_LOG: valid_heads 3
[    0.951453] OF_DC_LOG: dc controller index 0
[    0.951481] tegradc 15200000.nvdisplay: disp0 connected to head0->/host1x/dsi
[    0.951612] OF_DC_LOG: fb bpp 32
[    0.951633] OF_DC_LOG: fb flip on probe
[    0.951671] OF_DC_LOG: dsi controller vs DSI_VS_1
[    0.951691] OF_DC_LOG: Enable hs clock in lp mode 1
[    0.951723] OF_DC_LOG: n data lanes 4
[    0.951743] OF_DC_LOG: dsi video NONE_BURST_MODE_WITH_SYNC_END
[    0.951763] OF_DC_LOG: dsi pixel format 24BIT_P
[    0.951782] OF_DC_LOG: dsi refresh rate 60
[    0.951800] OF_DC_LOG: dsi rated refresh rate 60
[    0.951818] OF_DC_LOG: dsi virtual channel 0
[    0.951838] OF_DC_LOG: dsi instance 0
[    0.951856] OF_DC_LOG: dsi panel reset 1
[    0.951874] OF_DC_LOG: dsi panel te polarity low 1
[    0.951894] OF_DC_LOG: dsi panel lp00 pre panel wakeup 1
[    0.951914] OF_DC_LOG: dsi ganged_type 0
[    0.951965] OF_DC_LOG: dsi ganged write to both links 1
[    0.952004] OF_DC_LOG: dsi split link type 0
[    0.952022] OF_DC_LOG: dsi suspend_aggr 3
[    0.952047] OF_DC_LOG: dsi power saving suspend 0
[    0.952065] OF_DC_LOG: dsi ulpm_not_supported 1
[    0.952083] OF_DC_LOG: dsi video type VIDEO_MODE
[    0.952101] OF_DC_LOG: dsi video clock mode CONTINUOUS
[    0.952121] OF_DC_LOG: dsi n_init_cmd 16
[    0.952147] OF_DC_LOG: dsi n_suspend_cmd 6
[    0.952272] OF_DC_LOG: boardinfo platform_boardid = 0                                         platform_boardversion = 0                                       display_b
oardid = 0                                       display_boardversion = 0
[    0.952345] OF_DC_LOG: dc flag 1
[    0.952365] OF_DC_LOG: out_width 216
[    0.952385] OF_DC_LOG: out_height 135
[    0.952404] OF_DC_LOG: out_rotation 0
[    0.952435] tegradc 15200000.nvdisplay: No hpd-gpio in DT
[    0.952478] OF_DC_LOG: default_out flag 0
[    0.952503] OF_DC_LOG: parent clk pll_d
[    0.952523] OF_DC_LOG: framebuffer xres 1920
[    0.952541] OF_DC_LOG: framebuffer yres 1080
[    0.952627] OF_DC_LOG: of pclk 148500000
[    0.952720] OF_DC_LOG: fb window Index 0
[    0.952743] OF_DC_LOG: win mask 0x3
[    0.952766] OF_DC_LOG: cmu enable 1
[    0.952785] tegradc 15200000.nvdisplay: DT parsed successfully
[    0.952877] tegradc 15200000.nvdisplay: Display dc.ffffff800b610000 registered with id=0
[    0.964470] tegradc 15200000.nvdisplay: vblank syncpt # 8 for dc 0
[    0.964499] tegradc 15200000.nvdisplay: vpulse3 syncpt # 9 for dc 0
[    0.969918] tegradc 15200000.nvdisplay: DSI: HS clock rate is 445500
[    0.973498] tegradc 15200000.nvdisplay: probed
[    3.627714] tegradc 15200000.nvdisplay: fb registered

There is a /dev/fb0 device entry for the dsi frame buffer (as well as
/dev/fb1 and /dev/fb2 for the two direct HDMI outputs).

With Xorg disabled is not possible to write to /dev/fb0 (although I can write
to /dev/fb1 and /dev/fb2).

There is a /sys/class/graphics/fb0 entry created for this device. There is
a difference between the entries for the DSI frame buffer and the entries for
the two HDMI devices.

/sys/class/graphics/fb0/state = 1
/sys/class/graphics/fb1/state = 0
/sys/class/graphics/fb2/state = 0

It was not obvious to me after some digging, where this state field gets set or how to interpret it.

None of the other entries showed any obvious problem.

The stopping point for us is that Xorg will not start

...
(**) NVIDIA(0): Enabling 2D acceleration
(II) Loading /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so
(II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
        compiled for 4.0.2, module version = 1.0.0
(II) NVIDIA GLX Module  32.1.0  Release Build  (integ_stage_rel)  (buildbrain@mobile-u64-2988)  Wed Mar 13 00:25:11 PDT 2019
(EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
(EE) NVIDIA(0): Failing initialization of X screen 0
(II) Unloading glxserver_nvidia
(EE) Screen(s) found, but none have a usable configuration.
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
Please consult the The X.Org Foundation support 
         at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) Server terminated with error (1). Closing log file.

Note without the DSI enabled. Xorg starts and runs properly on the two HDMI displays.

Note also I have allocated 2 windows to each display head. From /sys/firmware/devicetree/…
here’s a hexdump -C of the win-mask entries from the device tree.

host1x/nvdisplay@15200000/win-mask
00000000  00 00 00 03                                       |....|
00000004
host1x/nvdisplay@15210000/win-mask
00000000  00 00 00 0c                                       |....|
00000004
host1x/nvdisplay@15220000/win-mask
00000000  00 00 00 30                                       |...0|
00000004

I tried to enable kernel debug messages in nvidia/drivers/video/tegra/dc/dsi.c but that
didn’t really provide any additional information.

Any information on additional debugging options would be helpful.

Thanks in advance,

Cary

cobrien · December 30, 2019, 10:26pm

Update: The problem with X11/Xorg was an old shared library from 28.1 being installed
during the application install process. With this removed, Xorg starts, and xrandr reports
the DSI display operative.

~# xrandr -q
Screen 0: minimum 8 x 8, current 5200 x 1080, maximum 32767 x 32767
DSI-0 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
   1920x1080     60.00*+
HDMI-0 connected 1920x1080+1920+0 (normal left inverted right x axis y axis) 600mm x 340mm
   1920x1080     60.00*+  59.95    50.00  
   1680x1050     59.96  
 
 .. more modes
HDMI-1 connected 1360x768+3840+0 (normal left inverted right x axis y axis) 480mm x 270mm
   1920x1080     60.00 +  59.95    50.00    50.00  
   1680x1050     59.96  
.. more modes

There is no video output though. I will have to coordinate
with the hardware engineer to find out if he can provide some insight into the problem.
I will update the thread with anything I find out.

cobrien · December 31, 2019, 6:42pm

Update2

(In case someone searches: X server (Xorg) won’t start with new kernel)

This was more than just the incorrect library.

The kernel in the sample rootfs expects modules to be in /lib/modules/4.9.140-tegra

The kernel build from the nvidia provided sources is named 4.9.140 (not -tegra), so
modprobe looks in /lib/modules/4.9.140. So the nvgpu module wasn’t getting loaded

I fixed this with a symlink

cd /lib/modules
ln -s 4.9.140-tegra 4.9.140

Not sure why the name is different, and there is probably a better solution, but
that will do for us right now.

Thanks,

Cary

linuxdev · December 31, 2019, 8:30pm

This is just a standard step all kernel builds must be told to use (this is not an NVIDIA customization, it is just tradition among Linux kernel builds to change the CONFIG_LOCALVERSION to some useful value to not mix up module directories among different kernels). This covers more than CONFIG_LOCALVERSION, but talks about that and “uname -r”:
https://devtalk.nvidia.com/default/topic/1057246/jetson-tx1/about-kernel/post/5381591/#5381591
(just skip down to the “For actual procedures…” part)

Note that every change to a built in kernel feature essentially produces a new kernel which can be entirely different from another kernel. Having the same source code in no way produces the same kernel unless configuration itself is a match. It is beyond the source code of the kernel to self-label as to the purpose any given configuration might have (only the person choosing the configuration can know that).