I’m sorry I shouldn’t have written that we have a root cause - we do not. We know that the PCIe Gen3->1 switch is what causes the problem, but we do not yet understand why. This is thought to be a platform bug and being investigated as such. I’ll update the thread as soon as I have more.
There will be some software changes to make the issue less likely to happen, but a real fix still depends on finding the root cause and that involves multiple vendors and a lot more investigation.