Downloading ILSVRC12 dataset

Hello, I just wanted to see if there was any way I could find a faster alternative to downloading the ILSVRC12 dataset; in the tutorial (two days to a demo) it mentions that it should take overnight on a decent Internet connection. I am guessing the author must have has a T-1 connection, as it apparently is going to take ~5 days to download all the images on my connection.

My system has other jobs to do, and I can’t tie it up for an uninterrupted 5 day download session. Might there at least be a way to download the dataset in chunks and not have to start from the beginning again?


You may try --continue option of command wget:

wget -c your_target_url

Hi, you could join if applicable (see download-faq).

The crawler script reads a file of image URLs, you could remove those URLs that you already have or remove URLs that you aren’t interested in (either in the file or by modifying the script). Also here appears a recent project that crawls Google Images using Python.

If you would prefer to use your own images, the imageNet example from 2 Days to a Demo is pretty easy to get images for in DIGITS image directory format - save a bunch of images of what you want to recognize into a folder named after each class.

Finally, you can skip ahead to the detectNet or segNet portion of the tutorial while the crawler script runs, if you prefer.

Thanks for the suggestions. I also found an academic torrent site that might speeed things up.

That’s a great point; I was looking at the datasets and wasn’t really interested in identifying birds, etc. Our application is much more specific. As far as collecting our own images, is it fair to say that it really doesn’t matter what kind of crawler you use to get them, as long as they are arranged in the expected file/folder hierarchy?

Thanks again!

Hi jhuds65, yes you are correct, as long as you organize your images in a directory structure like so, you can use any domain-specific images you like for your application:

+ cat/
   - cat_0.jpg
   - other-cat.jpg
   - any_file-name_OK.png
   - cat_N.jpg
+ dog/
   + chihuahua/
       - woof_woof.png
       - subdirectories_are_ok.jpg
   + labrador/
       - they_get_flattened_into_Dog_class.jpg

That’s great, and should be enough to get me off to a good start.

Again, thanks to everyone for their help!

Now, off to build something cool (I hope).