Jetsons were designed more for use of pre-trained models, not for training. If you were to train on a beefier desktop PC (or more), then you should be able to actually use the TX2 to run the model with a reasonably fast rate to make this useful in real time.
Keep in mind that on edge devices you have shared RAM between o/s and GPU using an integrated memory controller, whereas on a dedicated PCIe card you have a much faster RAM and this is directly built in to to GPU without going through the CPU’s memory controller.
Comparing a 256 CUDA core GPU to one with something containing beyond 3000 CUDA cores is also why training is slow on a Jetson. Training will likely require more cores than will an already trained model.
Add to this that cores typically run slower on embedded devices since one of the major reasons for using such a device is to run on lower power (e.g., more time on a drone or other battery powered unit). A strong PC with a full GPU might consume over 600 watts. Compare that to lower power models, e.g., 20 or 30 watts. Or the Nano in 5 watt mode. Add to this the weight required for heat sinks and thermal control, and then consider putting this on a weight sensitive drone.
Even with all of those disadvantages there are a lot of cases where you’ll be able to use cameras with 60+ fps even on high resolution (and testing your pre-trained model and getting an answer every single frame…higher or lower frame rate depending on a lot of things).
Even so, you can ask questions on how to optimize training. Perhaps it could work faster. However, you’d probably want to start a new thread and give details of release versions and what your model is, so on.