Hi Al-Khattab,
No problem! I’m not sure if “Hybrid” is the correct term but the concept in this instance is pretty simple. I’ll base it assuming you’ve done the JetBot road following / collision avoidance examples.
Suppose we have two models resnet18_road_follow
, resnet18_collision_avoidance
. Each is defined as follows
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
Each takes an image at 224x224 resolution and has 2 outputs: For collision avoidance these are [prob blocked, prob free]
and for road following [x, y]
of the target point.
The only difference is
- The loss function we use during training: For collision,we use cross entropy, for road following we use mean squared error
- The data provided during training: For collision we provide the class label (blocked, free), for road following we provide the target point x, y coordinates.
The architecture is identical, and a bulk of the model is likely used to compute natural image features which are shared for each task. We could then potentiall just create one model like this
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 4)
The number of outputs of the final layer is now 4. We can get the respective task values
collision_out = output[..., 0:2]
road_out = output[..., 2:4]
And compute a combined loss
collision_loss = ... # code to compute cross entropy for collision task
road_loss = ... # code to compute x, y mean squared error
loss = a * collision_loss + (1.0 - a) * road_loss
Where a
is just some scalar variable like 0.5
to weight the relative importance of each task.
I’m not sure how well this would work, you may encounter some difficulty balancing the importance each task, but it’s something to try. You could also try some variants like
- Holding the backbone weights constant
- Replacing more than just the final layer
etc.
Please let me know if this helps or you have any questions.
Best,
John