Optimizer = torch.optim.SGD()

I use this line “optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)” to do L2 regularization to prevent overfitting. Generally, regularization only penalizes the weight W parameter of the model, and the bias parameter b does not penalize, but there is a network saying that the weight decay specified by the optimizer weight_decay parameter of torch.optim is for all parameters in the network , Including the weight w and bias b for simultaneous punishment. Is that right?

Reference URL: pytorch实现L2和L1正则化regularization的方法_PKing666666的博客-CSDN博客_torch正则化

Hi,

You can find some torch.optim.SGD introduction in the following document:
https://pytorch.org/docs/stable/optim.html#torch.optim.SGD
https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD

In general, SGD is an optimizer for a trainable parameter.
Both weight and bias are trainable parameter so it will be applied to both of them.

Thanks.

How to not set the bias parameter b and only set the weight w?

Hi,

You can check this comment for some information:

Thanks.