Deep Vision Inference also has the tensorNet base class which can be used as a generic model. This mirrors the DIGITS approach which allows users to create Image Classification, Object Detection, Segmentation, or “Other” datasets and models:
In addition to using tensorNet directly, you can create your own subclass for keypoints like detectNet or segNet. These subclasses can be useful for containing any of the pre- or post-processing required by the network.
The other way, if DIGITS doesn’t fit your type of processing, is to use Caffe directly for training. Meanawhile TensorRT provides the fastest inference.
You might also want to consider the traditional image feature detection functions, which are available as part of the visionworks and visionworks-sfm packages. This is CUDA optimized code that returns image “feature points” and can run in real time on high-resolution imagery on the Jetson with capacity to spare.