I use this line “optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)” to do L2 regularization to prevent overfitting. Generally, regularization only penalizes the weight W parameter of the model, and the bias parameter b does not penalize, but there is a network saying that the weight decay specified by the optimizer weight_decay parameter of torch.optim is for all parameters in the network , Including the weight w and bias b for simultaneous punishment. Is that right?