With TLT1.0 and TLT2.0, the way to call tlt-train
, tlt-prune
, tlt-export
, etc was by starting up the TLT container and then running these commands in a Jupyter Notebook or via command line.
I haven’t personally tested TLT3.0 yet, but from the documentation it seems like TLT3.0 is used now as a Python Package (TLT Launcher), and any call to tlt train
, tlt prune
, tlt export
automatically starts a container with necessary dependencies and with a specific entrypoint.
From: https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/tlt_launcher.html:
The tasks are broadly divided into computer vision and conversational AI. For example, DetectNet_v2 is a computer vision task for object detection in TLT which supports subtasks such astrain
,prune
,evaluate
,export
etc. When the user executes a command, for exampletlt detectnet_v2 train --help
, the TLT launcher does the following:
- Pulls the required TLT container with the entrypoint for DetectNet_v2
- Creates an instance of the container
- Runs the
detectnet_v2
entrypoint with thetrain
sub-task
My questions relate to executing TLT tasks from within the same container. To give you an example, if we want to train a model with TLT2.0 in a cloud GPU environment, we can just start the TLT2.0 container on the VM and execute all steps of the training pipeline from that same container. However, with 3.0, since each command seems to instantiate and run a new container, we are presented with the challenge of either running a container in a container, or handling data persistence of large datasets across multiple short-lived containers.
Would it be possible to run the TLT launcher from within a custom container? This would mean we’d be launching the individual TLT containers inside the custom container.
OR, is it possible for us to manually start the TLT container (with no entrypoint) rather than launching it from the TLT launcher? In this setup, we could then run commands within this TLT container as normal, just like we currently do with TLT2.0 and TLT1.0
Thanks and I look forward to continued developments with the TLT/DS workflow!