I ran some further tests and came up with a workaround for the issue.
I had been using a customized kernel, so I started with some comparisons against the stock L4T kernel and configuration. I found I couldn’t reproduce the issue with the stock kernel. I also noticed that with my custom kernel, I would see the problem if I reduced kernel logging to the console using loglevel=3 on my kernel command line, but the SATA drive would always be detected if I left on all logging. That wasn’t really satisfactory, though, so I did some further digging.
One thing I noted was that the SSDs I’m testing all support DEVSLP. Since the problem manifested itself at device initialization time, I wondered if there was some kind of interaction between the Tegra’s power management and the drives’ DEVSLP feature. The following patch appears to work around the problem:
diff --git a/drivers/ata/ahci-tegra.c b/drivers/ata/ahci-tegra.c
index 138de56..55d8f4d 100644
@@ -3065,7 +3065,7 @@ static int tegra_ahci_init_one(struct platform_device *pdev)
if (!(hpriv->port_map & (1 << i)))
ap->ops = &ata_dummy_port_ops;
- ap->target_lpm_policy = ATA_LPM_MIN_POWER;
+ ap->target_lpm_policy = ATA_LPM_MAX_POWER;
rc = ahci_reset_controller(host);
With this change, the SSD is reliably detected on every boot. The libahci code will enable the drive’s DEVSLP feature with the ATA_LPM_MIN_POWER policy, and will disable it otherwise, which reinforces my suspicion that there’s some kind of timing issue or other interaction that was causing the problem.