SSH on Jetson TX2 failing at random intervals

We have bought 5 new Jetson TX2. Having flashed them and installed the same packages on all the 5 different boards, we run into troubles with one of them. The problem is exactly the same described here

Basically, whenever trying to ssh into the tx2 we sometimes get
ssh_dispatch_run_fatal: Connection to 10.136.63.111 port 22: incorrect signature
some other time
ssh_dispatch_run_fatal: Connection to 10.136.63.111 port 22: message authentication code incorrect

From time to time we even manage to ssh into the card, but shortly after the connection is closed with one of the previous two error messages.

We have already tried to re-flash the card (twice actually) but the problem is still there.
The fact that this is happening only on 1 out of 5 cards smells like some hardware problem.
Before asking for the substitution of the boards, is there any test you would suggest us to perform?

I couldn’t say for sure, but it sounds like the host key on the TX2 perhaps changed. Given that the MAC address will be constant across flashes, I am wondering if a previous ssh login to that TX2 memorized one host key, but then the key changed.

If authentication is incorrect, then it could certainly be a hardware failure where there is some issue such as network loss during the send of keys, but it is more likely to be a software issue.

At some point when this fails, can you perform a verbose ssh login and post the log here? Assuming the login account name is “nvidia”, and assuming the IP address is “192.168.55.1” (adjust for your case), then this would ssh with logging:
ssh -vvv nvidia@192.168.55.1 2>&1 | tee log_ssh.txt

Also, when this occurs, on the TX2 show the output of “ifconfig”. I am curious if it shows any sort of network error from the Jetson side.

Hi and thanks for you reply.

Before trying to ssh I have even removed the known_hosts file.
Here’s the results:

The ifconfig shows no error of any kind, so it does not seem to be an ethernet error.

It is odd because everything works as it should right up to the last moment. Then it fails with:

debug1: Entering interactive session.
debug1: pledge: exec
ssh_dispatch_run_fatal: Connection to 10.136.63.111 port 22: message authentication code incorrect

In one case I found on the web there was an error with the network constantly going up and down, and although I’ve seen that in other cases, your ifconfig looked fully operational and not failing. The only other fixes I’ve found on the web basically were from updating the software. It is kind of reaching, but have you updated? I have to warn you to back up before updating, but then:

  1. sudo apt update
  2. sudo apt-get upgrade

There was a patch in the past which seems to have caused this bug, and I am hoping an update will get past this if it is simply a problem with one sshd release.

The results attached to the previous message have been executed with the system already up to date. Same for the other 4 boards which work properly so far.

I suppose it could be a network error, such as truncated data, but I saw no sign of that. The ifconfig was showing correct operation. There could be an issue with a low power mode. Is it possible that this error occurs only after the Jetson has been sitting without a user keyboard/mouse interaction long enough that it might be in some sort of sleep or low power mode?

Not really, the test was performed with keyboard and monitor connected. And the ssh problems occur both when mouse/keyboard are attached, both when not

I don’t know what else to test on this. You might need to ask someone at the actual package maintenance for ssh software. It is quite possibly a hardware issue, but I’m not ready yet to say it is since this really does function correctly for other uses and this seems to be associated specifically with ssh, and then in only one place.

Anyone else have any ideas on why ssh would fail like this?

OP here. I ended up just returning the failing unit and getting a new one and that fixed it. Yours might have a HW issue just like mine.