Hello -
I am running a virtual machine to train my models/run jupyter-notebook. In going through TLT - CV Training, I have trained my model, pruned the model, and retrained off of the pruned model. After going through steps 1 - 10, I shut my virtual machine down. Came back later to then run the 10. A Int8 Optimization.
For step 10. Model Export (before shutting my machine down) I was receiving the following error:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
"experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
os.system("rm {}".format(output_file))
!tlt detectnet_v2 export \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY
2021-04-20 13:01:23,001 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Traceback (most recent call last):
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/export.py", line 12, in <module>
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 198, in launch_export
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 155, in run_export
AssertionError: Default output file /workspace/tlt-experiments/detectnet_v2/experiment_dir_final/resnet18_detector.etlt already exists
Traceback (most recent call last):
File "/usr/local/bin/detectnet_v2", line 8, in <module>
sys.exit(main())
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/entrypoint/detectnet_v2.py", line 12, in main
File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
2021-04-20 13:01:34,267 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
After shutting the machine down, restarting it, and running the same command, I am now seeing this error for the same 10. Model Export commands:
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_final
# Removing a pre-existing copy of the etlt if there has been any.
import os
output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
"experiment_dir_final/resnet18_detector.etlt")
if os.path.exists(output_file):
os.system("rm {}".format(output_file))
!tlt detectnet_v2 export \
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/resnet18_detector_pruned.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
-k $KEY
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-3-eab212de10ab> in <module>
2 # Removing a pre-existing copy of the etlt if there has been any.
3 import os
----> 4 output_file=os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'],
5 "experiment_dir_final/resnet18_detector.etlt")
6 if os.path.exists(output_file):
/usr/lib/python3.7/os.py in __getitem__(self, key)
677 except KeyError:
678 # raise KeyError with the original key value
--> 679 raise KeyError(key) from None
680 return self.decodevalue(value)
681
KeyError: 'LOCAL_EXPERIMENT_DIR'
Any ideas as to what is going on and how to solve this issue?
Thanks,
Bryan
P.s. I am a noob at this, so, any/all help is much appreciated. Getting to this point took a lot of effort and work. Lots of configuration to get iPython, Jupyter, etc. all to work nicely together. I’d love to solve this and move on with putting the models on my Nano.