Hi Al-Khattab,

No problem! I’m not sure if “Hybrid” is the correct term but the concept in this instance is pretty simple. I’ll base it assuming you’ve done the JetBot road following / collision avoidance examples.

Suppose we have two models `resnet18_road_follow`

, `resnet18_collision_avoidance`

. Each is defined as follows

```
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
```

Each takes an image at 224x224 resolution and has 2 outputs: For collision avoidance these are `[prob blocked, prob free]`

and for road following `[x, y]`

of the target point.

The only difference is

- The loss function we use during training: For collision,we use cross entropy, for road following we use mean squared error
- The data provided during training: For collision we provide the class label (blocked, free), for road following we provide the target point x, y coordinates.

The architecture is identical, and a bulk of the model is likely used to compute natural image features which are shared for each task. We could then potentiall just create one model like this

```
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 4)
```

The number of outputs of the final layer is now 4. We can get the respective task values

```
collision_out = output[..., 0:2]
road_out = output[..., 2:4]
```

And compute a combined loss

```
collision_loss = ... # code to compute cross entropy for collision task
road_loss = ... # code to compute x, y mean squared error
loss = a * collision_loss + (1.0 - a) * road_loss
```

Where `a`

is just some scalar variable like `0.5`

to weight the relative importance of each task.

I’m not sure how well this would work, you may encounter some difficulty balancing the importance each task, but it’s something to try. You could also try some variants like

- Holding the backbone weights constant
- Replacing more than just the final layer

etc.

Please let me know if this helps or you have any questions.

Best,

John