Object detection with custom dataset

Hi, currently i am trying to train a object detection model following the Hello AI World Tutorial. I already have a custom dataset and done labelling via LabelImg. I followed the steps in retraining the model in the tutorial, but i am getting TypeError: init() missing 1 required positional argument: ‘dtype’ error. Kindly help me as i am a beginner in this and it is for my school project. Below i have pasted a sc of the error statement. Thank you.

Hi @rishivaran95, LabelImg doesn’t typically organize the images in Pascal VOC directory structure, hence I recommend to use CVAT tool (with CVAT all you need to do is make a labels.txt)

If you download the Pascal VOC dataset, yours should be organized the same way. Or did you already do that?

Hi @dusty_nv sorry for the late reply, my directory structure is in the Pascal VOC format (Annotations, JPEGImages, ImageSets) already however when i try to run the train_ssd.py from the tutorial i still get the same error. Is there any changes i have to make since i am using a custom dataset?

Hi @rishivaran95, can you upload your dataset to Google Drive or somewhere so I can take a look?

Hi @dusty_nv here’s a drive link i have uploaded the dataset there. https://drive.google.com/drive/folders/1DPf1FE1sY_CeEstLhPwn8vjEov3b2Mic?usp=sharing. Thanks

Thanks @rishivaran95, I requested access to the download.

Hi @rishivaran95, I was able to run the training on your dataset without needing to modify it - however, I found that the training took up more memory than usual because your images are large (5033x3355). So it may be that you were running low on memory. You may want to try these suggestions:

Baring that, you may want to make your training images smaller (but then you would also need to rescale the annotations). It may be easier for you to rescale the annotation bounding box coordinates in the dataloader code than changing the XML files.

Hi @dusty_nv , i have tried resizing a few images and training it, seems to work even though i was prompted with a low memory warning. I believe my issue was due to the image sizes being too big so i have to resize all my dataset and try to train, thanks for helping me to resolve this issue. I will try out the suggestions you recommended. I was wondering if you have any suggestions on resizing the images without compromising the quality of the images?. Thank you.

Hi @dusty_nv i am able to run the training for my custom dataset. However right now i have a issue with the live detection. i have already converted the model to ONNX and tried to run the live program based on your tutorial. Unfortunately i am getting the error “INVALID_ARGUMENT: Cannot find binding of given name: data [TRT] failed to find requested input layer data in network [TRT] device GPU, failed to create resources for CUDA engine [TRT] failed to create TensorRT engine for models/test/ssd-mobilenet.onnx, device GPU [TRT] detectNet – failed to initialize. detectnet: failed to load detectNet model”. I have attached the sc of the error. Hope to hear from you soon on how to fix this issue thanks.

The images get downsized by PyTorch to 300x300 before being fed into the network anyways, so I wouldn’t worry about it. This is because the DNNs run at a lower resolution. Although they are downsized during the PyTorch pre-processing, it appears that just loading those large 5033x3355 images consumes a lot of memory, so you were correct to downsample them beforehand too.

It appears to be looking for the wrong layer name. What’s the command line you are running it with? Did you specify these arguments?

--model=models/test/ssd_mobilenet.onnx --labels=models/test/labels.txt --input-blob=input_0 --output-cvg=scores --output-bbox=boxes

Hi @dusty_nv yes i did specify with those args:

detectnet --model=models/test/ssd-mobilenet.onnx --labels=models/test/labels.txt --input-blop=input_0 --output-cvg=scores --output-bbox=boxes /dev/video0

i am able to run the fruit detection tutorial by the same cmd line too

Hi @rishivaran95, there appears to be a typo here: --input-blop should be --input-blob

Hi @dusty_nv, right it was a typo on my end it works now, thank you for the clarification.

Hi @dusty_nv how can i extract the coordinates of the bounding boxes when it detect a object?

Hi @rishivaran95, after you get the detections, you can get the bounding box from them like this:

detections = net.Detect(img)

for detection in detections:
     bounding_box = (detection.Left, detection.Top, detection.Right, detection.Bottom)

net.Detect() returns a list of detectNet.Detection objects, which have the following members:

Detection = <type 'jetson.inference.detectNet.Detection'>
Object Detection Result
 
----------------------------------------------------------------------
Data descriptors defined here:
 
Area
    Area of bounding box
 
Bottom
    Bottom bounding box coordinate
 
Center
    Center (x,y) coordinate of bounding box
 
ClassID
    Class index of the detected object
 
Confidence
    Confidence value of the detected object
 
Height
    Height of bounding box
 
Instance
    Instance index of the detected object
 
Left
    Left bounding box coordinate
 
Right
    Right bounding box coordinate
 
Top
    Top bounding box coordinate
 
Width
     Width of bounding box

Hi @dusty_nv, when i try to print(detections) i dont get the output as mentioned above. How do i rectify this? i am able to print the bounding box coordinates but would like to differentiate with which class it is detecting. Thank you

Hi @rishivaran95, sorry for the delay - instead of printing the entire detections list, try printing each detection object individually:

for detection in detections:
   print(detection)

Hi @dusty_nv, thanks for the reply, i am able to get the ids of the classes now.

Hi @dusty_nv, i would like to find out how to publish this coordinates in ROS ?