Logging the hyperparameters in ClerML

Any idea how hyperparameters from specs files could be logged in clearML?

One can easily log such hyperparameters as shown herin:

from clearml import Task
from clearml.task_parameters import TaskParameters, param, percent_param


# Connecting ClearML with the current process,
# from here on everything is logged automatically
task = Task.init(project_name='TAO FasterRCNN 1 Class', task_name='fasterRcnn')

parameters = {
    'list': [1, 2, 3],
    'dict': {'a': 1, 'b': 2},
    'tuple': (1, 2, 3),
    'int': 3,
    'float': 2.2,
    'string': 'my string',
}
    
parameters = task.connect(parameters)

However, this code will create a seperate experiment from the one that is created by the training docker.

Is there a way to direct both logs into one unique experiment?

For ClearML, please refer to TAO Toolkit Clearml Integration - NVIDIA Docs.
Seems that you are going to log the spec’s hyperparamerters into ClearML, which looks like a new feature request.

Yes, for instance, having the learning rate logged (and all the key-values in specs) in clearML.
A workaround would be to pick up the experiment name (created by TAO) and then log the whole specs file inside clearml using that experiment name.

While it’s possible to collect this experiment name by copying it from the ClearML interface, I am wondering whethere there is a programatic collecting it from the tao train docker?

For learning rate logging, please check if existing way mentioned in TAO Toolkit Clearml Integration - NVIDIA Docs meets the requirement. From "Streaming logs from the local machine running the training. ", the learning rate can be found in console log.

I am aware of this functionality. I could see all what is logged (debugging images, scalars, models, architectures, …). However, the most important thing (hyperparameters) is not logged.

For these hyperparameters to be logged, one has to:

  1. Get the experiment identifier (which is outputed in the jupyter notebook after running the docker).
  2. Parse the specs files (or the yaml) into a dictionnary and log them to the clearML task (given the identifier).

I did this. However, it’s prone to errors. I am a bit surprised NVIDIA did not forsee this feature as it one of the most important feature for comparing models performance givent the hyperparameter space.

1 Like

In case it’s of use for the community, I am sharing how I plugged the experiment identifier for further logging.

I assume that I am running a tao training. First, I create a mechanism to capture the output displayed from TAO training:

import sys
from io import StringIO
from contextlib import contextmanager

@contextmanager
def capture_output_and_display():
    class Tee(StringIO):
        def write(self, string):
            StringIO.write(self, string)
            sys.__stdout__.write(string)
    
    old_stdout = sys.stdout
    capture_buffer = Tee()
    sys.stdout = capture_buffer
    yield capture_buffer
    sys.stdout = old_stdout

Then, I run the TAO training with the context above in mind:

with capture_output_and_display() as output:
    !tao model faster_rcnn train --gpu_index $GPU_INDEX -e $SPECS_DIR/default_spec_resnet18.txt -r /workspace/tao-experiments/faster_rcnn

This will capture all the output.

Once the outputs are captured, I parse the experiment identifier and I dictionnarize the specs file and log them into clearML:

from log_clearml import logthem
captured = output.getvalue()
logthem(captured, 'specs/default_spec_resnet18.txt')

The code for logthem() is below:

import re
from clearml import Task

def parse_block(lines, idx):
    block = {}
    while idx < len(lines):
        line = lines[idx].strip()
        if line.endswith("{"):
            key = line[:-1].strip()
            idx, value = parse_block(lines, idx+1)
            if key in block:
                if isinstance(block[key], list):
                    block[key].append(value)
                else:
                    block[key] = [block[key], value]
            else:
                block[key] = value
        elif line == "}":
            return idx, block
        elif ":" in line:
            key, value = [x.strip() for x in line.split(":", 1)]
            # Simple type conversion
            if value.lower() == "true":
                value = True
            elif value.lower() == "false":
                value = False
            else:
                try:
                    value = float(value) if '.' in value else int(value)
                except ValueError:
                    value = value.strip('\'"')
            block[key] = value
        idx += 1
    return idx, block

def parse_text(text):
    lines = [line.strip() for line in text.splitlines() if line.strip() and not line.strip().startswith("#")]
    _, result = parse_block(lines, 0)
    return result

def read_file(filename):
    with open(filename, 'r') as f:
        return f.read()

def logthem(captured, specs):
    match = re.search(r'ClearML Task: created new task id=(\w+)', captured)

    if match:
        task_id = match.group(1)
        print(f"ClearML Task ID: {task_id}")
    else:
        print("Task ID not found in the captured output.")
    
    clearml_task = Task.get_task(task_id=task_id)
    text = read_file(specs)
    parsed_dict = parse_text(text)
    clearml_task.connect(parsed_dict)
1 Like

Thanks for the info. I will sync with internal team for the feature request for capturing more parameters in ClearML.

Thanks @Morganh, appreciate it.