Hello, I just wanted to see if there was any way I could find a faster alternative to downloading the ILSVRC12 dataset; in the tutorial (two days to a demo) it mentions that it should take overnight on a decent Internet connection. I am guessing the author must have has a T-1 connection, as it apparently is going to take ~5 days to download all the images on my connection.
My system has other jobs to do, and I can’t tie it up for an uninterrupted 5 day download session. Might there at least be a way to download the dataset in chunks and not have to start from the beginning again?
The crawler script reads a file of image URLs, you could remove those URLs that you already have or remove URLs that you aren’t interested in (either in the file or by modifying the script). Also here appears a recent project that crawls Google Images using Python.
If you would prefer to use your own images, the imageNet example from 2 Days to a Demo is pretty easy to get images for in DIGITS image directory format - save a bunch of images of what you want to recognize into a folder named after each class.
Finally, you can skip ahead to the detectNet or segNet portion of the tutorial while the crawler script runs, if you prefer.
That’s a great point; I was looking at the datasets and wasn’t really interested in identifying birds, etc. Our application is much more specific. As far as collecting our own images, is it fair to say that it really doesn’t matter what kind of crawler you use to get them, as long as they are arranged in the expected file/folder hierarchy?