DIGITS Automatic Train/Val Split

jtillett86 · April 30, 2018, 5:58pm

I’m trying to load in a dataset into DIGITS for object detection. The dataset is in KITTI format, but the folder structure is not split up into separate training and validation folders. I read here (DIGITS/ImageFolderFormat.md at master · NVIDIA/DIGITS · GitHub) that DIGITS can automatically split up your dataset for you. Unfortunately, I can’t seem to be able to do this. If I leave the validation folder fields blank, I get an error “Folder does not exist or is not reachable” and can’t proceed. If I enter the same folders as the training data, I get that same error but for both the training and validation fields. Any idea how to get DIGITS to automatically split up the dataset?

tcyang · April 30, 2018, 7:41pm

I believe this behavior is by design. DIGITS does not automatically split dataset of object detection into training and validation. In many cases, the frames are extracted from videos. If DIGITS just randomly split frames into training and validation, it is likely frames from the same video go to both training and validation. In that case, the validation accuracy will become artificially inflated. It’s better for users to separate dataset into training and validation based on their knowledge on the source of frames. Such as splitting video files into two disjoint sets, and extracting frames from one set to form training dataset and from the other to form validation.