Add new class after transfer learning has already been done. And/or continue with different datasets and classes

How to add new classes after transfer learning has already been done (say using the detectnetv2 base weights from ngc)?

Assuming the older data is deleted and we want to continue from the pruned and retrained model from prev train.

You can add this class in training spec.
Please set pretrained model in the training spec if you want to continue from the pruned and retrained model from prev train.

I trained a custom model using detectnetv2 hdf5. Then I removed all the images and labels and started with totally new images and labels but this time with an extra class. I used the final pruned model from the prev training as the new train’s pretrained starting point. Training went ok, pruning was ok as well but during retrain I get this error-

2021-07-30 00:45:08,334 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 18/120: loss: 0.00010 Time taken: 0:01:17.138544 ETA: 2:11:08.131521
2021-07-30 00:45:12,040 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.974
2021-07-30 00:45:17,496 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.985
2021-07-30 00:45:22,901 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.512
2021-07-30 00:45:28,297 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.601
2021-07-30 00:45:33,690 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.629
2021-07-30 00:45:39,086 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.597
2021-07-30 00:45:44,505 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.362
2021-07-30 00:45:49,980 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.790
2021-07-30 00:45:55,461 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.738
2021-07-30 00:46:00,924 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.921
2021-07-30 00:46:06,408 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.704
2021-07-30 00:46:11,849 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.139
2021-07-30 00:46:17,284 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.202
2021-07-30 00:46:22,736 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.031
2021-07-30 00:46:25,342 [INFO] /usr/local/lib/python3.6/dist-packages/modulus/hooks/task_progress_monitor_hook.pyc: Epoch 19/120: loss: 0.00016 Time taken: 0:01:17.020867 ETA: 2:09:39.107530
2021-07-30 00:46:28,166 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.249
2021-07-30 00:46:33,636 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.849
2021-07-30 00:46:39,079 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.116
2021-07-30 00:46:44,515 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.185
2021-07-30 00:46:49,929 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.417
2021-07-30 00:46:55,410 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.732
2021-07-30 00:47:00,843 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.226
2021-07-30 00:47:06,283 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.144
2021-07-30 00:47:11,732 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.059
2021-07-30 00:47:17,131 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.570
2021-07-30 00:47:22,606 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.795
2021-07-30 00:47:27,994 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 55.684
2021-07-30 00:47:33,502 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.466
2021-07-30 00:47:38,969 [INFO] modulus.hooks.sample_counter_hook: Train Samples / sec: 54.877
2021-07-30 00:47:43,960 [INFO] iva.detectnet_v2.evaluation.evaluation: step 0 / 88, 0.00s/step
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/magnet_train.py", line 55, in main
  File "<decorator-gen-2>", line 2, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 773, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 691, in run_experiment
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 624, in train_gridbox
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/train.py", line 149, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
    run_metadata=run_metadata))
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py", line 67, in after_run
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/validation_hook.py", line 73, in validate
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/evaluation/evaluation.py", line 165, in evaluate
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/postprocessing.py", line 146, in cluster_predictions
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/postprocessor/cluster.py", line 44, in cluster_predictions
AssertionError

From the log, you are training/pruning/retraining under TLT 3.0-dp version .

Can you share your training spec and retraining spec?

It is now fixed, I had the wrong pretrained settings in the spec files.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.