Integrating Tensorboard images into detectNet SSD training

Hey,

I have put some tensorboard code into the train_ssd.py from Dusty’s repo and it is pretty straight forward at first. My graphs need smoothing but they appear at least!.

The main issue I am having is working out which bits go where in terms of the results that the validation inference returns.

I need to take the groundtruths and predictions and draw them side by side in tensorboard, as in the example below:

This is a screenshot from my training using the tensorflow detection API. They use visualization techniques that are different to DetectNet.

So the code I have entered so far:

from torch.utils.tensorboard import SummaryWriter
# Writer will output to ./runs/ directory by default
current_day = datetime.date.today()
current_moment = datetime.datetime.now()
current_time = current_moment.strftime("%H-%M-%S")
board_name = f'{current_day}_{current_time}'
tb = SummaryWriter(comment=f'{board_name}')

def train(loader, net, criterion, optimizer, device, debug_steps=100, epoch=-1):
    net.train(True)
    running_loss = 0.0
    running_regression_loss = 0.0
    running_classification_loss = 0.0
    running_correct = 0
    for i, data in enumerate(loader):
        images, boxes, labels = data
        images = images.to(device)
        boxes = boxes.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        confidence, locations = net(images)
        regression_loss, classification_loss = criterion(confidence, locations, labels, boxes)  # TODO CHANGE BOXES
        loss = regression_loss + classification_loss
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        running_regression_loss += regression_loss.item()
        running_classification_loss += classification_loss.item()
        running_correct += (locations == boxes).sum().item()
        if i and i % debug_steps == 0:
            avg_loss = running_loss / debug_steps
            avg_reg_loss = running_regression_loss / debug_steps
            avg_clf_loss = running_classification_loss / debug_steps
            logging.info(
                f"Epoch: {epoch}, Step: {i}/{len(loader)}, " +
                f"Avg Loss: {avg_loss:.4f}, " +
                f"Avg Regression Loss {avg_reg_loss:.4f}, " +
                f"Avg Classification Loss: {avg_clf_loss:.4f}"
            )
           #Tensorboard scalar entry
            tb.add_scalar('Avg Loss', avg_loss, epoch)
            tb.add_scalar('Avg Regression Loss', avg_reg_loss, epoch)
            tb.add_scalar('Avg Classification Loss', avg_clf_loss, epoch)
            tb.flush()

            running_loss = 0.0
            running_regression_loss = 0.0
            running_classification_loss = 0.0

    tb.add_scalar('Training Loss', running_loss / len(data), epoch)
    tb.add_scalar('Training Accuracy', running_correct / len(data), epoch)


def test(loader, net, criterion, device):
    net.eval()
    running_loss = 0.0
    running_regression_loss = 0.0
    running_classification_loss = 0.0
    num = 0
    for _, data in enumerate(loader):
        images, boxes, labels = data
        images = images.to(device)
        boxes = boxes.to(device)
        labels = labels.to(device)
        num += 1

        with torch.no_grad():
            confidence, locations = net(images)
            regression_loss, classification_loss = criterion(confidence, locations, labels, boxes)
            loss = regression_loss + classification_loss
            #TODO: write images to tensorboard of detections and ground truths

        running_loss += loss.item()
        running_regression_loss += regression_loss.item()
        running_classification_loss += classification_loss.item()
    return running_loss / num, running_regression_loss / num, running_classification_loss / num

What current output looks like:

some scalars do not look right to me…

I am pretty much working with the official docs and whatever I can find online but I do not find many people out there doing this particular thing with detectNet :/

https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html

A point in the right direction as to how I go about extracting what I need from the results in test() to write to the board would be greatly appreciated!

Once I have completed this venture into integrating it properly I will post the full code for others.

EDIT: I have read through the code in the API: https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py

as well as detectnet.cpp etc and I can see this could be difficult getting the original boxes drawn and displayed however if I could even get just the detection images with boxes written to tensorboard that would be great without the originals next to them. It just makes it so much easier to scroll back through points in training and work out sweet spots for balance.

Hi @tom.parkerete4t, do the values in the graphs sync up with the values you see printed out to the terminal? A couple observations to point out:

  1. The ‘Avg Loss’, ‘Avg Regression Loss’, ‘Avg Classification Loss’ are expected to be noisy because these are done every N batches (where N=debug_steps). The loss fluctuates each batch during the epoch. The statistics at the end of each batch should be less noisy.

  2. The data for these running loss is getting set to 0 every debug_steps:

            running_loss = 0.0
            running_regression_loss = 0.0
            running_classification_loss = 0.0

    tb.add_scalar('Training Loss', running_loss / len(data), epoch)
    tb.add_scalar('Training Accuracy', running_correct / len(data), epoch)

So you may want to store it in another variable that doesn’t get reset every debug_steps. Also instead of dividing by len(data) you want to divide by len(loader) * batch_size

  1. running_correct += (locations == boxes).sum().item() will only count a bounding box as correct if it is identical in value to the ground-truth, which is unlikely (especially considering these are floating-point values). Typically a metric like intersection-over-union (IoU) is used to calculate the similarity between bounding boxes.

I don’t have experience with Tensorboard so can’t speak to getting the images displayed, but perhaps others from the community may be able to lend their experience there.

Thank you very much for that info it has helped a lot.

Regarding the images, for some reason I assmued that when the test function calls the network to test predicitons that the operation generates the same overlays as running the detectnet normally would? So was hopeful I could access that and write it. I will try stack overflow as well.

can’t believe I overlooked them being reset to 0…!

Thanks again

It doesn’t generate the images with the bounding boxes overlayed. An example of how that image is made is in the run_ssd_example.py script, but I believe this was meant to run on a trained model as opposed to during the training session.

Cool so the part that says:

    with torch.no_grad():
        confidence, locations = net(images)

can I not just add boxes, labels, to that or is locations boxes? net(images) is running prediction on the images right? so I should be able to pull the boxes and labels out of it as usual and just compose them? Looking at detectnet.cpp it appears to me that it renders the detections on the image when detect is called.

I’m going to play around with it this evening ,might work it out, Thanks.

These are the raw outputs of the network, and I believe these still need to be clustered and re-scaled to the size of the image. You should be able to find the code in the Predictor class.