Colour space management and calibration

We are using detection CNNs for an application that performs detection on very subtle and specific color differences within the captured image. We have found that using a different camera model greatly affects our accuracy due to our training data being captured by a limited set of cameras.

We would like to support multiple camera models but it would not be feasible to collect training sets for each camera model.

We are currently overcoming this problem by generating ICC profiles for each camera to transform images to a standard colour space for training and inference. We are making use of home grown infrastructure to handle this. It works reasonably well, but there are many improvements and research that needs to be done.

Is there a standard framework or procedure for handling colour calibrations, best practices, etc. in the context of deep learning? This would greatly reduce the effort and research required on our side.

Can you share more about your solution?

Specifically

  • Specifications of the sensors used, make, model, etc.
  • Additional details of the implementation, whether or not the Jetson ISP is used.
  • Details about the deltas you’re seeing being between cameras and how they’re measured.

Also, to confirm it sounds like you’re interested in both (A) tactical mitigations for challenges with the current approach and more so (B) strategic suggestions for overcoming this category of probably for a scalable solution that might remove the need for the homegrown mitigation.

Is that correct?

Intro

We found that the colour response by different camera models under the same lighting conditions can vary considerably. This meant that large volumes of training data that we had collected and labelled previously was not much use if using a different camera model.

We initially tried to get the colour spaces to match by adjusting the cameras’ white balances, but were unable to get an accurate enough match.

Sensors

We use a range of machine vision and DSLR cameras. For example:

  • PixeLINK PL-D755-CU
  • PixeLINK PL-D753-CU
  • Allied Vision Alvium 1500 C-500
  • Allied Vision Alvium 1800 C-507
  • Nikon D5600

Workflow and implementation

Generating ICC Profiles

For each camera we capture images of a ColorChecker inside a light box with a standard illuminant.

We then generate the ICC profiles using the Argyll CMS
colprof and scanin CLI applications.

Converting Images

Now that we have ICC profiles for each camera, we are able to convert their images to the colour spaces of other cameras or to the standard srbg colour space. We do this with the ImageCMS module within Pillow.

Training and Inferring

The easiest method would now be to transform each image to a standardised colour space (e.g. srgb) and use these to train the model. One would then convert the colour space of the captured image to srgb before inferring on it.

Our use case requires very low latency real-time inferences and converting the colour space before inference is just another pre-processing step increasing latency. We thus train models for specific camera models by converting the training data to that camera’s color space to before training the model. We are then able infer on the image with minimal pre-processing.

Camera deltas

The attached image contains a table with some raw camera examples (top row) and their conversion to a standard srgb colour space (bottom row). Notice that the Camera 1 raw images are a much more intense green than those of Camera 2. The resulting srgb colour spaces are much closer to each other.

With regards to your question:

Also, to confirm it sounds like you’re interested in both (A) tactical mitigations for challenges with the current approach and more so (B) strategic suggestions for overcoming this category of probably for a scalable solution that might remove the need for the homegrown mitigation.

This is correct, a strategic solution that removes the need for our homegrown solution would be ideal, but any tactical ideas would be greatly appreciated.

Thanks so much for sharing–and for your patience as we discussed and compiled solution elements.

We have come up with several methods and practices that might help, falling into basically three groups:

  • Augment training data to enable a single robust model to cover new camera color profiles without retraining
  • Accelerate color profile conversion so that it can happen under your inference latency budget
  • Use a deep learning method to integrate color profile conversion into the inference itself

More detailed discussion, arranged roughly in order of most tactical to most strategic:

Paired-image conversion model: after collecting an ICC profile from a new camera and generating a (perhaps partial) training dataset via your existing color profile conversion method, train a small NN model to generate the original image from the new-profile image. Since the transforms are simple compared to most DNN target functions, that should be a fast and accurate training job; then, at inference time, use the camera-specific color conversion model as the first few layers before your original detection model.

Augmentation by profile: pool full or partial training data together from multiple camera profiles and train a unified detector, leaving some of the color profiles’ data out as a test set to check whether the model is robust enough to generalize to unseen cameras.

  • This method would be more likely to work if it included a range of color profiles outside those of cameras you have direct access to, e.g. by Photoshop’s Javascript convertProfile method or downloading other profiles to feed through the ImageCMS/LittleCMS2 pipeline you’re using.

  • You could also expand the color profile set by interpolation between existing ones or jittering around them.

Accelerated augmentation: use NVIDIA DALI’s HSV augmentation or Brightness/Contrast augmentation to expand the training dataset with synthetic color modifications, speculating that the detector can work with features that are preserved across those augmentations.

  • Since you described the color differences of interest as subtle and specific, this kind of augmentation may be less helpful than usual–but even then, I’ve often been surprised at how well deep learning can accommodate those challenges via indirectly relevant features.

  • DALI has the advantage of doing the augmentation online, so there isn’t a large preprocessing step and you can continue running augmentations until you’re satisfied with the model accuracy.

Accelerated color profile conversion: DALI’s color space conversion doesn’t currently cover ICC profile pairs, so we are discussing with that team. I imagine that a GPU-accelerated method would fit under your latency budget.

Deep color constancy: a technique like the ones described in Cross-Camera Convolutional Color Constancy (code) or Replacing Mobile Camera ISP with a Single Deep Learning Model could allow you to use a single detection model across many cameras without manual color profile calibration via a more general deep-learned color conversion.

Hopefully, one or more of those can help! Really appreciate the chance to dig into this space with you.

1 Like

Thank you for your response @chris.milroy. It is very helpful.

The first prize solution would be to have the ICC profile conversion implemented in DALI because then we could use the DALI backend for image preprocessing in the Triton Inference Server which would be really nice and elegant. So if you have any sway in assigning the DALI team’s workload that would go a long way for us ;-)

For now, I think that we are going to investigate and hopefully go forward with training a small NN model to perform specific ICC conversions. The paper on Cross-Camera Convolutional Color Constancy is a very interesting concept.

Fantastic! Thanks so much @sidney.rubidge, and glad it was helpful.

Very much agreed that a DALI solution would be elegant and convenient–your feedback on that (and on the approaches you’re pursuing now) is super useful as we try to target our engineering.

I haven’t done an extensive investigation of the minimal NN that could perform profile conversion, but tests with the following PyTorch model produced visually indistinguishable and low-loss results with the profiles I tested, after training on ~7000 1280x720 image pairs randomly cropped to 128x128 patches each epoch:

network = torch.nn.Sequential(
    torch.nn.Conv2d(3, 8, 1),
    torch.nn.SELU(),
    torch.nn.Conv2d(8, 8, 1),
    torch.nn.SELU(),
    torch.nn.Conv2d(8, 3, 1),
    torch.nn.Sigmoid()
)

I’d expect that to add <1ms of additional inference latency when integrated and optimized.

1 Like