Image Segmentation Using DIGITS 5

jwitsoe · November 11, 2016, 5:51pm

Originally published at: https://developer.nvidia.com/blog/image-segmentation-using-digits-5/

Today we’re excited to announce NVIDIA DIGITS 5. DIGITS 5 comes with a number of new features, two of which are of particular interest for this post: a fully-integrated segmentation workflow, allowing you to create image segmentation datasets and visualize the output of a segmentation network, and the DIGITS model store, a public online repository…

anon98784321 · November 12, 2016, 8:19pm

Any plans to support Theano or TensorFlow?

anon35911335 · November 15, 2016, 9:51am

Can we use this for SAR images segmentation.

anon96632069 · November 15, 2016, 11:18am

Hi! I have also worked with FCN-8s for image segmentation with good results. The only problem is the high time required for both training and inference. Any idea how to attack this without loosing accuracy?

anon28844414 · November 15, 2016, 8:53pm

Hello Nerea, technology is moving fast so chances are the newer generations of GPUs would meet your processing time requirements. Did you try any of the GPUs from the Pascal family? Also, Fully Convolutional Networks scale extremely well to multi-GPU training so you might want to consider adding GPUs to reduce training time. For inference you can use TensorRT (https://developer.nvidia.co... to reduce test time.

Another option would be to base your network on a different CNN: FCN-8s is based on VGG-16 but you might want to stick to e.g. Alexnet and add skip connections, if what you're interested in is a finer grain in the predictions.

Alternatively, you could increase the stride of the first few convolutional layers to increase their receptive field and reduce the number of activations in the network thus reducing the computational complexity.

Yet another option is to reduce the image size if you feel this won't destroy too much content.

These are some thoughts but there are probably a million ways to tackle this problem.

anon28844414 · November 15, 2016, 8:56pm

Hello, I have found FCN models to work very well for very different datasets (natural images, synthetic images, medical images). I suggest you try and please let us know the result!

anon28844414 · November 17, 2016, 10:35am

We would love to support more frameworks in the future. DIGITS is open-source, feel free to contribute!

anon45123310 · November 21, 2016, 3:45pm

I'm having trouble understanding what is exactly meant by offset in Table 2 and why is it calculated as (P - (K - 1)/2) / S.

I understand that the new feature map size could be calculated as (W + 2P - K) / S + 1 where W is the input size. So for example, conv 1 with W = 224, P = 100, K = 11 and S = 4 would result in size 104. How does that relate to offset as defined above?

anon28844414 · November 22, 2016, 2:13pm

Consider conv1 for example: because this layer has 100-pixel padding on each side its output is larger than if there was no padding at all. Therefore if you were to upscale the output of conv1 to reconstitute an image of the original input you would have to crop the upscaled output. In a lot of cases, a center crop will do but if you want to calculate the offset incurred by each conv or pooling layer you can use the (P-(K-1)/2)/S formula. In practice you need to do this since the Caffe "crop" layer requires an offset and a shape, it won't do the center crop automatically.

Interestingly your question made me realize I had made a mistake in the offset calculations. The number is correct for each layer however the offset does not add up the way I showed. Amazingly the final answer is correct. I will fix this in the article but in the meantime I suggest you have a look at https://github.com/BVLC/caf...

anon45123310 · November 22, 2016, 7:14pm

I figured out the general intention, but I couldn't get the numbers to match and it is still not clear to me. In this example, 224x224 image with padding 100 on each side would result in 104x104 feature maps after conv1 (as given by W' = (W + 2P - K) / S + 1). If we upscale with deconvolution using stride 4, I am expecting a 416x416 image. With offset 23 in Table 2 and cropping 23 pixels at each side, this gives 370x370. What am I missing here?

When I think about it some more, the actual size after deconvolution should also depend on used kernel size, but I'm not sure if I could just use the above formula for size with known W' (104) and solve for W.

anon28844414 · November 22, 2016, 9:14pm

Let's assume your input is 224*224. The intrinsic offset of conv1 is (P-(K-1)/2)/S is 23.75. The output of conv1 has size 104*104.

In the article I omitted to say that a deconvolution layer yields an intrinsic offset of (K-1)/2-P so in your example if you want to upscale conv1 using stride S=4 and kernel size K=7 this would be 3. The size of the output of the upscale layer would be (W-1)*S-2P+K=(104-1)*4-0+7=419 (for each spatial dimension).

The update I need to make in the article is to fix the recipe for composing those offsets across layers. We can't simply add them up. We need to "back propagate" the offset, from the top of the graph to the bottom of the graph. The composition of a layer L1 with a layer L2 (i.e. L2 is a bottom of L1 in Caffe terminology), with offsets O1 and O2 respectively, yields an offset of O2/F+O1, where F is the cumulative scaling factor of L2.

So now in our example we have:
- offset of upscale layer: 3
- offset of conv1 layer: 23.75
- scaling factor of conv1: 1/4

- total offset of composition of upscale with conv1: 23.75/(1/4)+3=98

This means that you need to take 98 pixels off each border of the output of the upscale layer => you end up with 419-98*2=223 pixels. Adjusting for rounding errors due to integer kernel sizes this is exactly what you need.

anon45123310 · November 29, 2016, 9:56am

Thanks for a detailed explanation! :)

anon20118744 · December 11, 2016, 3:57am

Is the public DIGITS Model Store available now?

anon28844414 · December 11, 2016, 8:53pm

Sorry the public DIGITS Model Store is not available yet.

anon7244159 · December 21, 2016, 4:18pm

Interesting. There is a formatting mistake near "For example, consider conv1: since"

anon46987130 · December 25, 2016, 7:24am

Hello Greg, as demonstrated in the paper you mentioned above
(Ros_The_SYNTHIA_Dataset_CVPR_2016_paper.pdf),
to tackle domain shift best results are obtained using balanced gradient contribution (BGC), which consists in creating batches with images from both SYNTHIA (synthetic) and real images datasets. What are the practical implementation pro's and con's with respect to transfer learning? sorry for the general question, I am very new to these topics... still learning :-)

anon28844414 · December 31, 2016, 1:31pm

Hello Filippo. In this paper, section 4.3 mentions that feature extractors (also called "contraction blocks") are initialized from the corresponding "base" CNN, pre-trained on ILSVRC. This is what I did too.

BGC comes into play when studying the benefits of using the synthetic dataset during training before deploying the network on real images. Admittedly I don't have experience with this. Unless I am mistaken the paper does not give quantitative proof that BGC performs better than fine-tuning as only the BGC results are given.

anon28844414 · January 5, 2017, 11:06am

Thanks for letting us know, this is fixed now.

anon84242827 · January 14, 2017, 10:39am

Thank you for this great article! Is there any chance to get your pretrained model with FCN-8s on SYNTHIA? Or is there a way to get at least some of the models from the model store? Thanks and greetings

anon64930771 · January 16, 2017, 10:10am

Does anyone know when DIGITS 5 will be available on Amazon as AWI?

Currently only 4 is available: https://aws.amazon.com/mark...

Topic		Replies	Views
DetectNet: Deep Neural Network for Object Detection in DIGITS Technical Blog	23	1381	July 7, 2019
Training Instance Segmentation Models Using Mask R-CNN on the NVIDIA Transfer Learning Toolkit Technical Blog	3	1018	August 18, 2021
NVIDIA DIGITS Assists Alzheimer's Disease Prediction Technical Blog	22	485	May 14, 2018
Execute a DIGITS trained tensorflow model on TX2 using python Jetson TX2	14	2383	October 18, 2021
Build an AI Cat Chaser with Jetson TX1 and Caffe Technical Blog	2	309	January 23, 2018
Unable to get segmentation to work with Jetson TX2 Jetson TX2	25	6632	October 18, 2021
Solving SpaceNet Road Detection Challenge With Deep Learning Technical Blog	4	440	March 17, 2019
DIGITS: Deep Learning GPU Training System Technical Blog	54	727	January 7, 2025
Questions about Face-Recongnition Jetson TX2	46	8135	October 18, 2021
Deep Learning in a Nutshell: Core Concepts Technical Blog	14	665	October 9, 2020

Image Segmentation Using DIGITS 5

Related topics