Kernel locked on Orin

I haven’t upgraded to the latest Jetpack yet. I will soon. However, in the meantime, I have been managing the problem by limiting memory usage and using some workarounds, and can fairly reliably avoid it happening now.

My issues occured while using tensorflow, and my workarounds were all the following together:

  1. Enabling dynamic memory allocation in tensorflow
# enable dynamic memory allocation
gpu_devices = tf.config.experimental.list_physical_devices('GPU')
for device in gpu_devices:
    tf.config.experimental.set_memory_growth(device, True)
  1. Smaller batch sizes. This takes some trial and error. If the batch size is too big, it’s guaranteed to trigger the problem sooner or later.

  2. Clear the keras backend every training epoch.

tf.keras.backend.clear_session()