PXE Failure with R21.2 kernel. Works with R19.3

I have a strange issue with the R21.2 kernel & R21.2 U-Boot where PXE seems to crash with the error.

I looked at possible memory issues, but it appears U-Boot gives 16 MBytes (0x81000000 vs 0x82000000) of head room for the TFTP kernel transfer. My kernel is only 5.7 MBytes.

[    5.797787] Unable to handle kernel NULL pointer dereference at virtual address 0000000c

here is my pxe configuration file

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 NFS boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /zImage
      FDT /tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/nfs rw netdevwait ip=192.168.3.3:192.168.3.1:192.168.3.1:255.255.255.0::eth0:off nfsroot=192.168.3.1:/home/enewnham/tegraTarget/mrirootfs/ rootwait

R19.3 Working log http://pastebin.com/XUwuRtFA

R21.1 Failure log http://pastebin.com/xQJCAs3Z

Long ago I worked on some grub bootloader issues to develop diskless beowulf cluster management software…so I’m very rusty on PXE and have no experience on u-boot for PXE. So what I did was simply to compare the two logs to see what sticks out.

I noticed that the download speeds were very different. U-boot itself has to have the NIC driver, so differences could be due to that. In any case the exact byte size transfer of the DTB is the same so it looks like downloading probably works properly and certainly does in the DTB case at 56739 bytes. Since both DTB are the same it seems that hardware enumerations to the kernel are the same (it would be interesting to diff the DTS files to see if anything truly changed…it would be of minor interest to sha1sum the DTB files and see if they are exact matches). The important thing to note about the speed difference though is that there are differences between the u-boot source on R19 and R21 (it is known that some changes were made which altered how SD card and SATA boot…probably also modifying PXE boot as well).

Byte size fails to be constant between kernels, but this is expected since R21 kernel and R19 kernel changed. You could possibly compare the listed byte size in the logs to the exact byte size in your PXE server directories and verify that they log sizes match their kernel sizes…but all looks good up to the point where software is downloaded.

Failure occurs during USB setup…this kernel driver is loading here:

usbcore: registered new interface driver usbhid

Here are the successful lines in R19 which never occur in R21:

usbhid: USB HID core driver
tegra-hier-ictlr tegra-hier-ictlr: probed

It looks like the setup prior to USB loading succeeds and an error in the HID device driver (USB loads as a communications protocol and then HID device loads as a dependency of USB human interface devices…keyboards, mice, joysticks). So although it is not a guarantee it very much seems that USB loaded correctly and then triggered the load of specific device support over USB where HID devices failed.

Were you trying to boot without keyboard/mouse? These are HID devices. If your R19-versus-R21 PXE boots had different USB devices connected then it is very plausible that the HID device driver code incorrectly fails to account for boot without keyboard (or maybe mouse). This was actually an issue for me over a decade ago under GRUB…failure to handle no keyboard at all…I had to rewrite part of GRUB to handle this. NULL pointer dereferences would tend to agree with the possibility of executing uninitialized code due to missing pointers to hardware (hardware in this case being a keyboard).

I did check the networking and TFTP, as you said, it does look to be working. I noticed that I still had the USB debug cable connected, so I disconnected that, I also tried with a USB wireless mouse, but alas. Similar results.

I am currently comparing R19 with R21 to see if there is anything obvious…

R21.2 with a wireless mouse http://pastebin.com/y4Fc7Mxh

R21.2 without debug USB connected http://pastebin.com/Zw5J6Zry

You can see when I disconnected the USB, the line

tegra-hier-ictlr tegra-hier-ictlr: probed

Let me ask specifically about the keyboard. Is a keyboard attached? Mice are always optional, bugs I’ve seen sometimes make a keyboard mandatory. If a keyboard is attached, is it to the full-sized USB connector? If not, then try with a USB keyboard…avoid a wireless keyboard for testing, as this complicates things if the wireless has any issues.

Rgr, Just tried with a basic USB keyboard [url]http://pastebin.com/9WyMq2uW[/url]

Didn’t get time to check kernel config parameters last night, but Hopefully I can find something today…

I do appreciate the help!

I was able to dupliate the problem. It is crashing in tegra_hier_ictlr_irq_handler() in drivers/platform/tegra/hier_ictlr.c. This is new code that did not exist in 19.3 so that is why it does not show up there.

The interrupt handler tries to get the handle to the driver’s private data with:

struct tegra_hier_ictlr *ictlr = dev_get_drvdata(dev);

Adding debug here shows the pointer is null. Then if you look down in the driver’s probe routine you see that this pointer is setup by calling dev_set_drvdata() after the irq_init routine is called:

ret = tegra_hier_ictlr_irq_init(pdev, ictlr);
if (ret)
    return ret;

tegra_hier_ictlr_create_sysfs(pdev);

dev_notice(&pdev->dev, "probed\n");

dev_set_drvdata(&pdev->dev, ictlr);

So I believe the problem is the interrupt is firing before the private pointer is initialized. I reversed the order of the above calls and the pointer is now valid but then the irq handler reports an error reading the status register and the kernel hangs when it hits the BUG() macro:

status = readl(ictlr->mselect_base + MSELECT_ERROR_STATUS_0);
if (status != 0) {
    pr_err("MSELECT error detected! status=0x%x\n", (unsigned int)status);
    BUG();
}

Not sure why this only happens in PXEBOOT other than the boot timing is probably significantly different when booting over the network and/or a lot of interrupts are happening with the network traffic. Note that I did my test connected to a 100MBit switch given the known problems trying to use the network with 1Gib speed.

Perhaps someone from NVIDIA can share some light on this?

Can someone from nvidia please respond to this?

Hi RickDL, I’ve forwarded your problem to some people internally.

ENewnham,

Pls specify if you are using pxe boot or NFS boot or tftp boot,
All these three boot methods are different in some respect.

Coming to the error you have posted, it is a bit vague, doesn’t provide any valid clue of what is happening,
Pls post more boot log to see what the core issue is.

Pls mention the source where you have got the mentioned pxe conf file from, It doesn’t seem to be pxe file, but is nfs conf file

If so, have you exported the filesystem through /etc/exports file? and then restarted the nfs kernel server?

I have the below extlinux.conf file and nfs and tftpboot works solid.

Ex:

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 NFS boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/zImage
      FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/nfs rw netdevwait ip=:::::eth0:on nfsroot=10.19.66.10:/media/Linx1/wrk_bench/l4t/k310/Rel21/jetson-tk1/R21.2/full_linux_for_tegra/Linux_for_Tegra/rootfs rootwait

and flashed with

./flash.sh -N 10.19.66.10:/media/Linx1/wrk_bench/l4t/k310/Rel21/jetson-tk1/R21.2/full_linux_for_tegra/Linux_for_Tegra/rootfs jetson-tk1 eth0

NOTE: One should have rootfs path exported in /etc/exports, and a restart of NFS kernel server

In either case, pls provide full boot log, for more/better analysis

sundeep, thanks for your response!

I am actually using PXE boot, which allows me to completely boot the board over the network. This requires only flashing UBOOT and the partition tables.

To perform this task I flashed u-boot and the partition tables then copied my zImage and DTB file to a tftp directory. An easy way to get u-boot is just flash the board using the Linux_For_Tegra instructions, you can then catch U-Boot at the prompt and manually run pxe boot.

I followed these instructions for TFTP server How do I install and run a TFTP server? - Ask Ubuntu

cd Linux_for_Tegra/kernel
mkdir /tftpboot/boot
cp zImage /tftpboot/boot
cd tegra124-jetson_tk1-pm375-000-c00-00.dtb /tftpboot/boot

I then wrote this to my extlinux.conf

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 NFS boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/zImage
      FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.               old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0      board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/nfs rw netdevwait ip=192.168.3.3:192.168.3.1:192.168.3.1:255.255.255.0::eth0:off      nfsroot=192.168.3.1:/home/enewnham/tegraTarget/mrirootfs/ rootwait

then I performed

mkdir /tftpboot/pxelinux.cfg
cp extlinux.conf /tftpboot/pxelinux.cfg/default

then I booted the board normally, and caught the board at U-Boot prompt.

Tegra124 (Jetson TK1) # setenv bootcmd_pxe "if pxe get; then pxe boot; fi"
Tegra124 (Jetson TK1) # run bootcmd_pxe

here are my logs:

R19.3 working R19.3 working PXE Boot - Pastebin.com
R21.2 failure R21.2 Failed PXE boot - Pastebin.com

I am sure my /etc/exports are correct, and the files do transfer over correctly. Further, I have tried a different laptop to the same effect. Please let me know if you need any assistance on getting PXE booting to work.

edit: Also RickDL has done a good job at nailing down the problem further.

It appears like the logs I obtained don’t provide enough detail. If you have a kernel and/or commands with more verbosity I’d be happy to try.

Can you try booting the original kernel or the Grinch kernel (the exact same binaries)? Even if they wouldn’t have support for NFS root, it would be interesting to see if they suffer from this same kernel oops.

ENewnham,

Thanks for providing more info.
There are a lot of differences b/n R19.3 and R21.1, and hence to R21.2
Pls try to reflash R21.2 release afresh, then check the things to work.
literally many differences (Pls check release notes of R21.1)

It is working for me,
Pls check why eth: has off in your case, shouldn’t it be on

Tegra124 (Jetson TK1) # set serverip 10.19.66.10                                                                                                                                      
Tegra124 (Jetson TK1) # set ipaddr 10.19.66.111                                                                                                                                       
Tegra124 (Jetson TK1) # setenv bootcmd_pxe "if pxe get; then pxe boot; fi"                                                                                                            
Tegra124 (Jetson TK1) # run bootcmd_pxe                                                

Retrieving file: pxelinux.cfg/default                                                                                                                                                 
Using RTL8169#0 device                                                                                                                                                                
TFTP from server 10.19.66.10; our IP address is 10.19.66.111                                                                                                                          
Filename 'pxelinux.cfg/default'.                                                                                                                                                      
Load address: 0x90100000                                                                                                                                                              
Loading: #                                                                                                                                                                            
         471.7 KiB/s                                                                                                                                                                  
done                                                                                                                                                                                  
Bytes transferred = 967 (3c7 hex)                                                                                                                                                     
Config file found                                                                                                                                                                     
Jetson-TK1 NFS boot options                                                                                                                                                           
1:      primary kernel                                                                                                                                                                
Enter choice: 1:        primary kernel                                                                                                                                                
missing environment variable: bootfile                                                                                                                                                
Retrieving file: /boot/zImage                                                                                                                                                         
Using RTL8169#0 device                                                                                                                                                                
TFTP from server 10.19.66.10; our IP address is 10.19.66.111                                                                                                                          
Filename '/boot/zImage'.                                                                                                                                                              
Load address: 0x81000000                                                                                                                                                              
Loading: #################################################################                                                                                                            
         #################################################################                                                                                                            
         #################################################################                                                                                                            
         #################################################################                                                                                                            
         #################################################################                                                                                                            
         #################################################################                                                                                                            
         ##############################                                                                                                                                               
         2.3 MiB/s                                                                                                                                                                    
done                                     
Bytes transferred = 6161224 (5e0348 hex)                                                                                                                                              
append: console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43
:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Ada
pter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x
02:0x43:0x00 root=/dev/nfs rw netdevwait ip=:::::eth0:on nfsroot=10.19.66.10:/media/Linx1/wrk_bench/l4t/k310/Rel21/jetson-tk1/ER-2015-01-20-k310-l4t-l4t-r21_r21.3_RC1/full_linux_for_
tegra/Linux_for_Tegra/rootfs rootwait                                                                                                                                                 
missing environment variable: bootfile                                                                                                                                                
Retrieving file: /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb                                                                                                                       
Using RTL8169#0 device                                                                                                                                                                
TFTP from server 10.19.66.10; our IP address is 10.19.66.111                                                                                                                          
Filename '/boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb'.                                                                                                                            
Load address: 0x82000000                                                                                                                                                              
Loading: ####                                                                                                                                                                         
         2.3 MiB/s                                                                                                                                                                    
done                                                                                                                                                                                  
Bytes transferred = 57167 (df4f hex)                                                                                                                                                  
Kernel image @ 0x81000000 [ 0x000000 - 0x5e0348 ]                                                                                                                                     
## Flattened Device Tree blob at 82000000                                                                                                                                             
   Booting using the fdt blob at 0x82000000                                                                                                                                           
   Using Device Tree in place at 82000000, end 82010f4e                                                                                                                               
                                                                                                                                                                                      
Starting kernel ...                                    
....
[    0.000000] Kernel command line: console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmub
oard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartpo
rt=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0
 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/nfs rw netdevwait ip=:::::eth0:on nfsroot=10.19.66.10:/media/Linx1/wrk_bench/l4t/k310/Rel21/jetson-tk1/r21/full_linux_for_tegra/Linux_for_Tegra/rootfs rootwait       
....

[    8.512236] r8169 0000:01:00.0 eth0: link down                                                                                                                                     
[    8.518370] r8169 0000:01:00.0 eth0: link down                                                                                                                                     
[    8.524582] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready                                                                                                                     
[   10.100811] r8169 0000:01:00.0 eth0: link up                                                                                                                                       
[   10.107393] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready                                                                                                                
[   10.112080] Sending DHCP and RARP requests ., OK                                                                                                                                   
[   10.139063] IP-Config: Got DHCP answer from 10.19.64.3, my address is 10.19.65.6                                                                                                   
[   10.151637] IP-Config: Complete:                                                                                                                                                   
[   10.157181]      device=eth0, hwaddr=00:04:4b:1e:07:ef, ipaddr=10.19.65.6, mask=255.255.252.0, gw=10.19.64.1                                                                       
[   10.171767]      host=10.19.65.6, domain=nvidia.com, nis-domain=(none)                                                                                                             
[   10.180795]      bootserver=10.19.81.77, rootserver=10.19.66.10, rootpath=                                                                                                         
[   10.187681]      nameserver0=10.19.84.101, nameserver1=10.19.84.102, nameserver2=10.25.20.252                                                                                      
[   10.204117] ALSA device list:                                                                                                                                                      
[   10.209492]   #0: HDA NVIDIA Tegra at 0x70038000 irq 113                                                                                                                           
[   10.217239]   #1: tegra-rt5639                                                                                                                                                     
[   10.341024] VFS: Mounted root (nfs filesystem) on device 0:11.                                                                                                                     
[   10.350087] devtmpfs: mounted                                                                                                                                                      
[   10.355867] Freeing unused kernel memory: 504K (c0b38000 - c0bb6000)                                                                                                               
[   10.686303] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.                             
[   11.707026] init: plymouth-upstart-bridge main process (115) terminated with status 1                                                                                              
[   11.720278] init: plymouth-upstart-bridge main process ended, respawning                      
...
...
                                                                                                                                                                                      
Ubuntu 14.04.1 LTS tegra-ubuntu ttyS0                                                                                                                                                 
                                                                                                                                                                                      
tegra-ubuntu login: root (automatic login)                                                                                                                                            
                                                                                                                                                                                      
root@tegra-ubuntu:~#

My /tftpboot/pxelinux.cfg/default file being

TIMEOUT 30
DEFAULT primary

MENU TITLE Jetson-TK1 NFS boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/zImage
      FDT /boot/tegra124-jetson_tk1-pm375-000-c00-00.dtb
      APPEND console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 mem=2015M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/nfs rw netdevwait ip=:::::eth0:on nfsroot=10.19.66.10:/media/Linx1/wrk_bench/l4t/k310/Rel21/jetson-tk1/r21/full_linux_for_tegra/Linux_for_Tegra/rootfs rootwait

sundeep,

Thanks for your information, I have nailed it down to U-Boot

Here are the steps I took, Using the provided u-boot.bin I managed to get PXE boot to work. However, when I build from the provided source code http://developer.download.nvidia.com/mobile/tegra/l4t/r21.2.0/sources/u-boot_src.tbz2 it doesn’t work…

mkdir ./tmp
cp ~/Downloads/u-boot_src_21_2.tbz2 .
tar -xvjf u-boot_src_21_2.tbz2
cd u-boot
export ARCH=arm
export CROSS_COMPILE=<insert your cross compiler here>
make jetson-tk1_config
make all
# backup provided u-boot.bin for later
cp ../Linux_for_Tegra/bootloader/ardbeg/u-boot.bin ../Linux_for_Tegra/bootloader/ardbeg/u-boot.bin.orig
# copy built u-boot.bin
cp u-boot-dtb-tegra.bin ../Linux_for_Tegra/bootloader/ardbeg/u-boot.bin
cd ../Linux_for_Tegra
# flash built u-boot.bin
sudo ./flash.sh –k EBT jetson-tk1 mmcblk0p1
# this crashes

Doesn’t work… But if I flash the provided binary instead, it starts working again.

# restore provided u-boot.bin
cp Linux_for_Tegra/bootloader/ardbeg/u-boot.bin.orig Linux_for_Tegra/bootloader/ardbeg/u-boot.bin
# flash provided u-boot.bin
sudo ./flash.sh –k EBT jetson-tk1 mmcblk0p1
# this works

Are you working off different source code? Do you recommend a git repository or tag that does work? Maybe its a compiler issue? I didn’t have any build errors when I built U-Boot.

Many Thanks,
Elliot

Can someone from nvidia check into this?

I believe that the U-Boot source code on the webpage isn’t the same as the released binary for R21.2

Which U-Boot source code created the released binary? Perhaps this is a compiler issue?

By “doesn’t work” and “this crashes” you mean that it crashes exactly in the same way as your earlier 21.1 boot log shows?

The package includes source_sync.sh that syncs also the u-boot source.

You can download it here, if you want to try that:
[url]http://nv-tegra.nvidia.com/gitweb/?p=3rdparty/u-boot.git;a=summary[/url]

It should be quite easy to verify that at least the .tbz2 and the git tree matches.

Which compiler are you using?

Did you follow the compilation instructions to the letter (I’m asking because your steps above don’t match them exactly)?

Sorry for the long period of silence. This issue still exists.

I tried compiling tags tegra-l4t-r21-21.2 & tegra-l4t-r21-er-2015-02-02

both of which failed to PXE boot. I am using the ubuntu compiler given to me via JetPack. I have taken your advice and I performed word for word what is described in the NVIDIA Tegra Linux Driver Package documentation but alas it had no affect.

The provided binary U-Boot has the version info (This binary works)

U-Boot 2014.10-rc2-00001-g9f88c9e (Dec 01 2014 - 14:29:15)

However compiled U-Boot has the verion info (This source code doesn’t)

U-Boot 2014.10-rc2-dirty (Feb 13 2015 - 09:59:41)