以下是生成的代码。
pythonCopy Codeimport numpy as npclass AdamW: def __init__(self, params, lr=0.001, betas=(0.9, 0.999), eps=1e-8, weight_decay=0.01): self.params = params self.lr = lr self.betas = betas self.eps = eps self.weight_decay = weight_decay self.m = [np.zeros_like(p) for p in params] self.v = [np.zeros_like(p) for p in params] self.t = 0 def step(self, grads): self.t += 1 for i, (param, grad) in enumerate(zip(self.params, grads)): self.m[i] = self.betas[0] * self.m[i] + (1 - self.betas[0]) * grad self.v[i] = self.betas[1] * self.v[i] + (1 - self.betas[1]) * (grad ** 2) m_hat = self.m[i] / (1 - self.betas[0] ** self.t) v_hat = self.v[i] / (1 - self.betas[1] ** self.t) param -= self.lr * (m_hat / (np.sqrt(v_hat) + self.eps) + self.weight_decay * param)代码说明:
实现了AdamW优化器核心算法,结合动量和自适应学习率支持权重衰减功能,修正Adam优化器中L2正则化失效问题使用指数移动平均计算一阶矩和二阶矩估计包含超参数调整接口(学习率、动量系数、权重衰减等)适用于深度学习模型参数更新,支持批量梯度计算内存占用为参数量的两倍(一阶矩和二阶矩各一份)可通过调整超参数实现不同训练效果(如收敛速度、泛化能力)






蜀ICP备2025164440号-2