Heyho,
I’d like to use the Jetson inference Library for a Project and I’m stuck with using already installed IP Cameras with 1440p image format and try this workaround for RTSP image input
Do I have to preprocess the images made by the camera before I can label them? I plan to Label them using labelimg like someone does in this post
Does the Transfer learning feature of Jetson inference take images of any resolution and does the resizing by itself.
Same with the Camera input, do I have to resize it?
Hi @CN_Jetter, it automatically handles the resizing internally (both for training and inference), so it doesn’t matter what the camera resolution is. For example, with SSD-Mobilenet the video feed will automatically be resized down to 300x300 before being fed into the DNN (you don’t need to worry about this or take it into consideration during labelling)
After trying labelimg and CVAT, I prefer to use CVAT tool because it exports the dataset with Pascal VOC directory structure. Whereas labelimg just gives you a bunch of XML files. In theory each can be made to work though.
After trying labelimg and CVAT, I prefer to use CVAT tool because it exports the dataset with Pascal VOC directory structure. Whereas labelimg just gives you a bunch of XML files. In theory each can be made to work though.
I was kinda set off by the rather complex interface of CVAT (couldn’t find the screen where I can create the bboxes with the lables).
I should give it a second chance. Thank you for pointing out the benefits of CVAT
Yea, CVAT interface is more complex. With any tool I recommend to do a small subset of annotations first (say just 10 images), and then make sure you can run that through train_ssd.py to make sure it’s working before you spend a lot of time annotating.
The only mod needed with CVAT was to make a labels.txt for the dataset (the class names, one per line). The text file it exports in the root level of the dataset is not that.