Steps to Production quality system

Hi,

I’m looking into what the recommendations are from nvidia on steps we should take before releasing a product that has a jetson embedded. Here’s a few items of concerns I’m thinking about, but would be interested in the community’s experience, as well as nvidia’s recommendation.

Updating System

  • OS (say a real good security issue comes up.)
  • Jetpack (new version of jetpack comes out, we’d want to update our system with it’s content)
  • Over the air update.

Optimizations & OS lockdowns

Startup procedures

  • startup your processes with systemd
  • wait for camera daemon or other dependencies to be ready

Post startup / runtime subsystem error handling

  • detecting and restarting the camera daemon

Any other items we should tack on the list?
Perhaps nvidia would have a sample production filesystem?

-Quincy

Hi quincy.cs

I’m not sure what kind of recommendations you are looking since that’s too widely, also don’t know what kind of product you would like to develop, and the warranty you want to provide, but at least, you need to define your product quality leave and create the relevant test to ensure the quality before shipping, no matter from HW, SW, reliability and stability…etc; regarding the customer service, the service flow, convenience and cost all should be considered together.

Regarding sample production filesystem, maybe you could refer to [url]http://developer.nvidia.com/embedded/dlc/l4t-root-filesystem-source-28-2-ga[/url]

Cheers

@kayccc , thanks.

I was looking into “Updating Jetpack (update to the latest version)”, do you know if this is possible after we ship a product out? E.g. we provide customers a product then want to trigger an over the air system update to get the latest jetpack installed.

Hi quincy.cs,

Why need to update the JetPack? The JetPack is just an installer to help developer updating the developer tools.
Supposed you means the system image, right? AFAIK, the over the air system update is not supported on current L4T products.

Cheers.

@kayccc , I think we are on the same page.

I have a product given to a customer, it has a TX2 with all the stuff jetpack 3.2 has installed. Now a new jetpack comes forward 3.3, and the question is, how can this TX2 get updated with all the stuff that jetpack 3.3 would install. Sounds like it’s a gap in the platform.

You could update the rootfs while in a rescue SD card or some other alternate rootfs. This would not work though if the other parts of the system changed, e.g., partitions other than already existing or device tree. Also, something stopping the update could disable the device until a full flash is done.

Do you need the end user to have the SD card available? If not, then you might be in luck. The original flash could be to tell the boot loader to look for the SD card’s “/boot” and have the default boot go to this as a custom rescue boot…but if this boot has reason to believe the install is good, then it could chain load back to the eMMC. There are a lot of details to work out to make something like that work, it wouldn’t be easy, but when you are in a rescue mode or alternate boot system you could easily update the entire eMMC with dd (and dd could stream outside data). A mechanism could even be put in place to “mark” how much of dd was complete in order to pause and resume. Not a trivial thing to set up, but it could be done.

You will need someone who knows linux system management and disk partitioning to help you. Almost none of this is Jetson-specific, and all of it is linux-specific. (Someone with experience for embedding linux, as opposed to running linux on servers or desktops, may be the best match for this need.)

The safest approach is to set up a dual-partition option, where you download new updates to a new partition, and then update your boot manager to try the new system. In the best of worlds, you also get the boot manager to record whether it successfully booted or not, and if it finds it didn’t, try booting the old image.

However, the system you run is just files-on-disk. You can download a tar archive or zip file or whatever, verify it against some public key you install on the system (don’t lose the private key half you’d use for signing!) and then sudo un-tar/un-zip from the root, and you’ll end up with an “updated” system. Obviously, crashing in the middle of this process will leave the system with half-updated files, and it may or may not be bootable at that point. There are safer ways to do this, too, using multiple stage updates, atomic file-moves from temp files, and so forth. Exactly what’s best for you depends entirely on your specific deployment situation.

To turn off the GUI, update systemd to not require graphical.target by default (typically you’ll want network.target or multi-user.target)

To enable or disable other functions, use systemd enable/disable to turn on/off the services that you want/don’t want (such as auto-updates.)
You may also want to define a service file for your own (search “systemd unit file”) to enable your own software to run on start-up.

@linuxdev , thx. That’s helpful to think about. I am in luck, that end users don’t have access to anything on the jetson (including the SD).

@snarky, also thank you. Yeah would need to discover exact steps to go from tarball to dual boot. That’s probably much simpler than actually running jetpack installer on the end user’s jetson.

On the topic of defining my own service file using systemd, is there a way to determine if the camera daemons are ready to use? I’m seeing my service try to use the cameras too quickly without the daemons fully booted. I also wonder if this process of waiting for cameras to be available is different per camera driver.

In systemd, you need to put in a dependency After= in your .service file, to make sure it starts after your dependencies.
If the NVIDIA system services do not properly publish their ready state, either through block-and-fork, or through Dbus, then you’re in trouble, and have to make your programs test whether things are ready when they start up, and keep spinning for a bit until they become ready …

@snarky, thx. Yeah, and thinking thru this, we should also consider the scenario of if the cameradaemon or some other dependencies fail after our processes have been using them. E.g. everythings fine for 3 hours, but then for w/e reason the camera daemon needs to be restarted. (Adding it to the original post here). Curious if someone knows all the components needed to restart. E.g. maybe its just the camera daemon or if there are other services as well.