Unable to get back to a running TX2 system

When I originally set up the TX2 I was able to go through the installation without errors and get to where I had source code CUDA samples that I could compile and run on the target. I then foolishly decided to try to get NSight Eclipse running on my Ubuntu 16.04 host which clobbered my machine and force me to reinstall Ubuntu (not the topic I want to discuss right now, but just giving some background info).

I decided to give up on NSight go back to what I had originally working on the target. Unfortunately, I have not been able to do that. I downloaded a jetpack (not positive its the same one I used on the first install which worked). I ran it which caused it to install code on the host and then reimage the target through forced recovery as I did at the very beginning. Unfortunately, I have not been able to get all the way through without some obscure errors that seem to vary depending on what I try to do to fix it.

First, I was getting a missing CUDA key problem asking me to run apt-key on the host, which I did, but I then get other errors talking about missing libraries of various kinds. When I was finally able to get far enough along to get a CUDA samples directory, it did not have source and make files with it, but only had compiled samples which don’t seem to run.

I’m looking for a simple link that says “download this file and it will install everything correctly on your host and your TX2 without errors”. The file I have been using that is causing me all sorts of grief is JetPack-L4T-3.3-linux-x64_b39.run. I would copy and paste the errors into here but XTerm does not allow cut and paste. Whoo hoo for linux. I’m thinking that maybe I am using the wrong file which I picked out of a list of dozens of files. Would like something on the web site that says “if you have TX2, then use this file” instead of guessing. Thanks,

Roger

Terminal copy is sometimes by mouse highlight, then right click, choose copy, or CTRL-SHIFT-c (paste being CTRL-SHIFT-v). Any possibility of getting a log would help. Keep in mind that you don’t need to pick flash each time. For software installs at a later date don’t connected the micro-B USB cable, and don’t put the Jetson in recovery mode. Without flash you’ll need to manually enter the IP address as well.

In the forum itself, if you paste a log and want to add scrollbars and preserve formatting, then look at the “code” icon in the upper right (looks like “</>”). Highlight the text, click the icon.

If you want to attach a log file, then you have to attach to an existing post and cannot attach while creating the post. When you hover the mouse over the quote icon in the upper right of one of your existing posts a paper clip icon will also show up. The paper clip icon will attach a file. You probably want a log file name to end with a “.txt” suffix (the forum only allows certain file types and might get confused for some common file types if the extension isn’t the name it expects).

Some sort of log would help (FYI, there will be a log in a subdirectory of your JetPack directory).


What follows is more editorial than useful, YMMV…

FYI, Linux has a purely text mode environment available with no graphics…I know this isn’t the case for you since you are using JetPack, but someone may want to know. In console mode, for mouse copy and paste, you need to run gpm (“sudo apt-get install gpm”…possibly then “sudo systemctl enable gpm.service”), and then you can mouse highlight followed by right click to paste. There are variations depending on environment.

One of the complications (and strengths) of Linux is that pure console mode and GUI mode and every kind of look and feel is separate from the actual operating system. As a result you can reboot just graphics, or just reboot networking, so on. A crash of one subsystem of the whole system doesn’t always require a reboot, and every distributor out there has their own idea of how look and feel should be (including copy and paste key bindings). So this unfortunately means that things like copy and paste key bindings may differ between two distributions, or even between which look and feel you choose within a single distribution.

One of the most hated features of Ubuntu is the lightdm/unity window manager setup. On my host I use KDE and would never develop on lightdm/unity. Some distributions of Linux chose window managers (basically the look and feel of the GUI) which are closer to what Windows works with. People often mistakenly believe they need that distribution to get that look and feel, but with some work the window managers (and look and feel) can be added to different Linux distributions. I’ve not bothered since I use Fedora with KDE, but there exists a possibility of switching Ubuntu’s desktop look and feel for one more like Windows without reinstalling the whole system. I wouldn’t recommend doing this for someone who isn’t “fluent” with Linux. If I were to develop on Ubuntu I’d probably use Kubuntu (which has KDE).

Here are samples that I cut from two different logs. It would seem to me that I should not be getting errors when I run an installer from a supported platform and completely reimage the target, but I am. That’s why I’m wondering if I am using the correct installer.

Preparing to unpack .../cuda-repo-l4t-9-0-local_9.0.252-1_arm64.deb ...
Unpacking cuda-repo-l4t-9-0-local (9.0.252-1) ...
Setting up cuda-repo-l4t-9-0-local (9.0.252-1) ...

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

OK
Connection to 192.168.0.101 closed.

dpkg-query: package 'cuda-toolkit-9-0' is not installed and no information is available
dpkg-query: package 'libfreeimage-dev' is not installed and no information is available
dpkg-query: package 'libopenmpi-dev' is not installed and no information is available
dpkg-query: package 'openmpi-bin' is not installed and no information is available
Use dpkg --info (= dpkg-deb --info) to examine archive files,
and dpkg --contents (= dpkg-deb --contents) to list their contents.
1
Error: CUDA cannot be installed on device. This may be caused by other apt-get command running on device when installing CUDA. Please use apt-get command in a terminal to make sure following packages are installed correctly on device before continuing:
cuda-toolkit-9-0 libgomp1 libfreeimage-dev libopenmpi-dev openmpi-bin
After these packages are installed on device, press Enter key to continue

--------------------------------------------------

Preparing to unpack .../libnvinfer4_4.1.3-1+cuda9.0_arm64.deb ...
Unpacking libnvinfer4 (4.1.3-1+cuda9.0) ...
dpkg: dependency problems prevent configuration of libnvinfer4:
 libnvinfer4 depends on cuda-cublas-9-0; however:
  Package cuda-cublas-9-0 is not installed.

dpkg: error processing package libnvinfer4 (--install):
 dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Errors were encountered while processing:
 libnvinfer4
Selecting previously unselected package libnvinfer-dev.
(Reading database ... 
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 150142 files and directories currently installed.)
Preparing to unpack .../libnvinfer-dev_4.1.3-1+cuda9.0_arm64.deb ...
Unpacking libnvinfer-dev (4.1.3-1+cuda9.0) ...
dpkg: dependency problems prevent configuration of libnvinfer-dev:
 libnvinfer-dev depends on libnvinfer4 (>= 4.1.3); however:
  Package libnvinfer4 is not configured yet.

Roger

There may be previous package run issues still sticking around. Try this first:

sudo systemctl stop apt-daily.timer
# Wait a few seconds...
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

Then verify you can do this:

apt search cuda-toolkit

Assuming it replies with “cuda-toolkit-9-0”, run:

sudo apt-get install cuda-toolkit-9-0

Does that work? Then if you want to start the automatic apt mechanism:

sudo systemctl start apt-daily.timer

Should this succeed, then you can try package changes again from JetPack. If not, then list results.

My first problem is that for your instructions, and almost all instructions given to me on the installation console, it fails to say whether the suggestion is something that I should type into the host system or into the target platform.
I tried your command on the host, then waited, then the apt-key which failed.

I then started all over by uninstalling and restarting the jetpack 3.3 install on my host system. It sent through all the steps again, including re-imaging the target, and then failed at the same place with the console message:

The public CUDA GPG key does not appear to be installed.
To install the key, run this command:
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

OK
Connection to 192.168.0.101 closed.
dpkg-query: package 'cuda-toolkit-9-0' is not installed and no information is available.
dpkg-query: package 'libfreeimage-dev' is not installed and no information is available.
dpkg-query: package 'libopenmpi-dev' is not installed and no information is available.
dpkg-query: package 'openmpi-bin' is not installed and no information is available.
Use dpkg --info (= dpkg-deb --info) to examine archive files, and dpkg --contents (= dpkg-deb --contents) to list their contents.
1
Error: CUDA cannot be installed on device. This may be caused by other apt-get command running on device when installing CUDA.  Please use apt-get command in a terminal to make sure following packages are installed correctly on device before continuing:
cuda-toolkit-9-0 libgomp1 libfreeimage-dev libopenmpi-dev openmpi-bin
After these packages are installed on device, press Enter key to continue

Forgive me if there are any typo’s in the above because I manually copied it so that I could paste it here.

I then decided that maybe it wants me to follow the instructions on the target, so I performed the apt-key command which seemed to work (“OK”).
Then I did a sudo apt-get install libgomp1, which worked.
Then I tried the other 3 files that it said it wants and they all failed with the message “Unable to locate package…”
Then I did a sudo apt-get install cuda-toolkit-9-0, which listed many things and finally appeared to succeed with a bunch of “Setting up cude-xxx” messages followed by a “Processing triggers for libc-bin…” message.

So, not sure what to do now since 3 of the requested files did not install and I am sitting at the prompt on the host that tells me to press Enter once I have it all installed.

Roger

The above commands would apply to either the Jetson (if the Jetson has the error), or an Ubuntu host PC (if it is the PC giving the error). The trick is that when running JetPack it can be talking about host or Jetson (most people are talking about the Jetson). My assumption in the above was the Jetson.

Note that an apt-key might differ between PC and Jetson…not sure.

Your log starts at a location which is suggesting running the apt-key command from the Jetson, and then starting again on the package install to Jetson (without flashing again). Verify first that the Jetson has this:

ls /var/cuda-repo-9-0-local/7fa2af80.pub

…then, if present, do the recommended command from Jetson:

sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub

What followed after closing connection to 192.168.0.101 is probably for host, but you should ignore host for now. Re-run JetPack, disable every package except for CUDA for Jetson in the menu, check to see if 192.168.0.101 is still a valid address, and then run again like that. See if it claims CUDA installed to the Jetson. This is the cornerstone software, and if this works, then you should be able to install all other Jetson packages.

Note that you might pick to install samples after other packages. Samples complicate things since it wants to cross-compile them on your host. Even though it might be installing samples to Jetson, it is in fact doing some work on the host as well. Simply skip this until everything else works.

Note that a host might have a similar issue with wanting a key for CUDA. If you’ve selected to install software only to host, and pick only CUDA, then it might give you a similar apt-key command to run from host. Just avoid host installs until Jetson is where it needs to be.

Here is what I tried today:

Verified that the 7fa2af80.pub key was there and able to install it per your request.

Selected just to install Cuda toolkit on the target (host and target image already installed).

It prompted me for IP address, username, password. I checked ifconfig on the target and verified that it is using 192.168.0.101. (Had to install sshpass on the host to get past this dialog).

It says:
Post installation Jetson Tx2/Tx2i
Following actions will be performed at this stage
And then a blank dialog.
If you proceed, nothing gets installed and it says in a console:
Installation of target components finished, close this window to continue
Then “Installation complete”.

I did this several time verifying that my selections were precise.
I’m afraid to tell it to flash the image first and then install because I’ll be back to the error I was seeing before.

So, then I ran component manager again and told it to install the Cuda samples (under the presumption that maybe the Cuda toolkit is already installed).
It would not let me select the samples without also selecting the toolkit. So, I selected both and proceeded. It then gave me the dialog showing that there was nothing to do.

So, I ran it again and selected to flash the OS image and install the toolkit and the samples. Same thing. There was nothing for it to do.

I really wish this tool would just do what I tell it rather than second guessing me. Its the most bizarre installer that I have ever tried to use.

Next I uninstalled everything, deleted host directories, and started the JetPack 3.3 from scratch.
Selected ONLY host setup and target reflash & Cuda toolkit.
Got the error about the apt-get key issue, followed by missing library files, etc.
Performed the sudo apt-key command.
Started the Jetpack again and selected only the Cuda toolkit, nothing else.
This time the list was not blank as before, saying that it was about to install the Cuda toolkit on the target.
Console window scrolled a bunch of things for a while.
Then got the age old error about CUDA could not be installed on device. This may be caused by other apt-get command running on device when installing CUDA…
Told me I needed to install the toolkit along with several libraries and to press a key when it was finished.

Does your TX2 have the “/var/cuda-repo-9-0-local/” directory? If so, then all of the deb packages are there. You might be able to try this:

sudo dpkg -i /var/cuda-repo-9-0-local/cuda-core-9-0_9.0.252-1_arm64.deb

(I might have a different cuda version, you’d need to adjust for that file name if so)

I am also curious if anything shows up from:

sudo dpkg -l | egrep -i cuda

I typed “sudo dpkg -l | egrep -i cuda” and got:
ii cuda-repo-l4t-9-0-local 9.0.252-1
arm64 cuda repository configuration files

I also ran “sudo dpkg -i /var/cuda-repo-9-0-local/cuda-core-9-0_9.0.252-1_arm64.deb” and got:
Reading database …
Preparing to unpack …
Unpacking …
dpkg: dependency problems prevent configuration of …
cuda-core-9-0 depends on cuda-license-9-0; however:
Package cuda-license-9-0 is not installed.
cuda-core-9-0 depends on cuda-misc-headers-9-0; however:
Package cuda-misc-headers-9-0 is not installed.

I then ran dpkg for the license and the misc-headers deb files in that directory and they installed okay. I then reran the cuda-core install and that appeared to finish without errors. Not sure where to go from here with the host sitting at the error message. I suppose I should cancel out of that and see if I can then install other things in the Component Manager list that we left off.

So, after the above, I went back and started the Component Manager again and selected to just install the CUDA samples. It told me that that depends on the CUDA Toolkit (which I presume the Component Manager has no way to know or is too lazy to figure out that I installed in manually above). So, it selected the toolkit automatically for me and would not let me just select the CUDA samples.

So, I proceeded, and it whirled around for a while and came back with the same error that I was getting before concerning libgompl, libfreeimage-dev, libopenmpi-dev, and openmpi-bin.

So, back at square one.

Next, I pressed Enter and it proceeded to compile the samples and push them to the target, creating an NVIDIA_CUDA-9.0_Samples directory. I went into it and found the oceanFFT example which had previously run, and tried to run it. It got this:

./oceanFFT: error while loading shared libraries: libcufft.so.9.0 …

But then was able to run other simpler things like simpleZeroCopy and vectorAdd. So, I’m assuming that I just need to go back and install more supporting tools in Component Manager and it will be considered up and running.

Not sure why loading CUDA Toolkit never succeeds, nor sure why I need it, but things appear to run which is what I want for now. Thanks

Well, not so fast. When I go back to install everything else, it still flags that I don’t have CUDA toolkit installed and insists that I add that to the list of things to install. Would be nice to figure out how to fool it. I despise installers that are so automated that they don’t let you manage workarounds.

Are you running previously built binaries or are you able to build these with your current setup ?

If you can build, then probably you have nvcc working and required libs are available. vectorAdd and simpleZeroCopy don’t require runtime so no dynamic libs. OceanFFT on its side, requires cuFFT.

libcuFFT should be in folder /usr/local/cuda-9.0/targets/aarch64-linux/lib with other cuda libs.

Do you have these lines at the end of /home/nvidia/.bashrc :

export PATH=<b>/usr/local/cuda-9.0/bin:</b>/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
<b>export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:</b>

/usr/local/cuda-9.0/bin is where nvcc is.
/usr/local/cuda/lib64 is a link to /usr/local/cuda-9.0/targets/aarch64-linux/lib, and ld will look there for dynamic libs.

You can check if oceanFFT can find these with:

ldd /usr/local/cuda-9.0/samples/5_Simulations/oceanFFT/oceanFFT
	...
	<b>libcufft.so.9.0 => /usr/local/cuda-9.0/targets/aarch64-linux/lib/libcufft.so.9.0 (0x0000007f7a9b8000)</b>
	...

I believe it’s executing the newly built binaries.

When I went back and selected the other things in Component Manager, it got through some of them but then failed on tensorrt. I may or may not need these things, but it would be nice to know why the installer is unable to execute all of the required installations on the target.

I went into /var/cuda-repo-9-0-local and started running dpkg on the various libraries, and when a library failed because of a dependency, I went and ran dpkg on that dependency. I got through a number of these, but then came across cuda-cudart-dev-9-0 which “installed” but the error said that it was not yet “configured”. Okay, well, I don’t know how to configure apparently.

Again, I probably have enough working to be able to limp through some simple demos and experiments, but it would be nice to know why it can’t just all install and configure itself as advertised.

Roger