# Optimizer = torch.optim.SGD()

I use this line “optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)” to do L2 regularization to prevent overfitting. Generally, regularization only penalizes the weight W parameter of the model, and the bias parameter b does not penalize, but there is a network saying that the weight decay specified by the optimizer weight_decay parameter of torch.optim is for all parameters in the network , Including the weight w and bias b for simultaneous punishment. Is that right?

Hi,

You can find some torch.optim.SGD introduction in the following document:

In general, SGD is an optimizer for a trainable parameter.
Both weight and bias are trainable parameter so it will be applied to both of them.

Thanks.

How to not set the bias parameter b and only set the weight w？

Hi,

You can check this comment for some information:

Thanks.