I have a project which is about recording images of roads around my home along with its position(latitude, longitude) then train a deep neural network using NVIDIA DIGITS and finally predict the position of newly taken images using the model I have trained. I have already recorded almost 400k images with over 400 labels(position). The problem is I don’t know which architecture should I use for training deep neural network. Is this a classification problem? Since there is lots of different labels first approach came in to my mind was dividing roads to regions(will be used as classes) and associate images to which region they belong before training. Accuracy will be low but I don’t know anything else to do. Could other architectures such as autoencoder, regression or siamese be used for getting accurate results without preprocessing(merging labels in a region) ?