Moving past the tutorials: advice for custom object detection

I’m a student (doing a PhD in microbiology) who does not have much experience with the Jetson beyond having completed the Hello AI world tutorial. I’m not a very confident developer and I’m feeling a bit overwhelmed regarding the implementation of my project. I’ve tried to get my head around TensorRT, CUDA, the transfer learning toolkit, deepstream, etc but at this point feel a bit lost for how to proceed.

I’d really appreciate some advice which could point me in the right direction and bridge the gap between the tutorial and implementing a ‘real’ application. Dusty’s guide was great and everything worked fine but going from that to a real-time inference program with data logging and other tasks is quite difficult for me.

My project is conceptually quite simple as an object localisation and cropping problem. I’m trying to make a water quality tool which is open source and can be used to improve public health. I’ll provide a little description:

I’m using a FLIR USB3 camera hooked up to a microscope.
Microorganisms will pass through the field of view of the microscope rapidly such that they won’t appear in more than one frame. In each image I want to segment out the organisms and crop them, saving the cropped images (but not original) to disk.

I don’t need to classify the organisms in real time as classification is done later using a specialist application which has been scientifically validated.

Within each picture the organisms are quite obvious and visually stand out from the background - but there is smaller detritus, uneven lighting, contrast issues and other challenges that the traditional image processing techniques I’ve tried can’t really deal with.

I therefore think that deep learning approaches are well suited to the problem and I have had early success using a very simple, shallow (8-layer) fully-convolutional regression network (image to image regression) . This network takes single points (centroid pixel locations) as training data and outputs an image of the same size as the input, representing a density map of organism locations (a heat map), which can then easily be fed to simple thresholding, watershed etc algorithms in order to segment and crop regions with organisms present.

In terms of implementation, please let me know whether I’m on the right track:
In order to run this on the Jetson nano, my understanding is that I will need to train the model on a separate pc, store a frozen graph containing a network description and all the trained weights, then convert that to a ONNX format, then to a TRT engine.

Then I can use a python script run on the Jetson to -

  1. Grab a frame from the camera.
  2. Pass the frame to the TRT model and get the output image.
  3. Run the necessary image processing to generate bounding box coordinates.
  4. Use those coordinates to crop ROIs from the original image and
  5. Save those to disk before grabbing the next frame.

Is this right? Is the above possible on a jetson or am I limited to the models introduced in the Object Detection part of Hello AI World (e.g. MobileNet SSD)?
Would it make more sense to directly grab bounding boxes using a SSD/YOLO-type detection network (even though the classification head will be redundant)?
Rather than running inference within a python script, should I instead be using a deepstream application? (is it plausible to get a reasonable FPS in real-time inference?)

I understand that my use case is very specific and this post may be misplaced on these forums but I am unable to find any tutorials or information which address the step of moving from toy problems with a small list of supported models to a more custom solution. I’m also sorry in advance if these questions are too basic. I’m not a programmer and neither are my supervisors. As a total newbie to the Jetson ecosystem the number of options is a bit over my head. I’m persisting in trying to do this on a jetson rather than a GPU-equpped PC, though, as the microscope is designed to be autonomously used in environments where money is a limiting factor.

A massive thank you in advance if anyone can point me in the right direction! I’m really excited to be able to get a biological tool built using the Jetson Nano which has been seriously impressive so far!


Since you will need the camera input and re-training, it’s recommended to use TLT + Deepstream.

First, please collect the database of your problem.
Based on the description, your problem can be either detection or segmentation task.
You can find the supported model of TLT toolkit here:

Object Detection: DetectNet_V2 FasterRCNN SSD YOLOV3 RetinaNet DSSD
Instance Segmentation: MaskRCNN

After you got a customized model, please check this sample to deploy your model on Jetson Nano.
This is a python-based and configure-like example.


1 Like

Hi and thanks for your reply.

Is it also possible for me to train a model that is not included in TLT, convert it to tensorflow RT (following the code here: and then run this in deepstream? Or do you think this is unnecessary?


It’s possible.
But this may require some customization.