According to the above training log, the training is not running successfully.
2023-07-03 17:23:37,264 [INFO] tensorflow: Graph was finalized.
2023-07-03 17:23:37,288 [INFO] root: CUDA runtime implicit initialization on GPU:0 failed. Status: the provided PTX was compiled with an unsupported toolchain.
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/scripts/train.py>", line 3, in <module>
File "<frozen iva.detectnet_v2.scripts.train>", line 1032, in <module>
File "<frozen iva.detectnet_v2.scripts.train>", line 1011, in <module>
File "<decorator-gen-117>", line 2, in main
File "<frozen iva.detectnet_v2.utilities.timer>", line 46, in wrapped_fn
File "<frozen iva.detectnet_v2.scripts.train>", line 994, in main
File "<frozen iva.detectnet_v2.scripts.train>", line 853, in run_experiment
File "<frozen iva.detectnet_v2.scripts.train>", line 728, in train_gridbox
File "<frozen iva.detectnet_v2.scripts.train>", line 197, in run_training_loop
File "<frozen iva.detectnet_v2.training.utilities>", line 143, in get_singular_monitored_session
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1104, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 727, in __init__
self._sess = self._coordinated_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 878, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 647, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 290, in prepare_session
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/session_manager.py", line 194, in _restore_checkpoint
sess = session.Session(self._target, graph=self._graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 699, in __init__
self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: the provided PTX was compiled with an unsupported toolchain.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'IsVariableInitialized_1035:0' shape=() dtype=bool>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "<frozen iva.detectnet_v2.training.utilities>", line 143, in get_singular_monitored_session File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1104, in __init__
stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 727, in __init__
self._sess = self._coordinated_creator.create_session() File "<frozen moduluspy.modulus.hooks.hooks>", line 285, in begin File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/tf_should_use.py", line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
But you mentioned earlier that “I have successfully trained a model that tests against my hold-back test set with 93.6% mAP”.
Am I missing something?
More, what is the CPU you are using?
Also, could you share $nvidia-smi as well?