# 聊聊神經網路的優化演演算法

2023-12-18 18:01:15

### SGD

``````class SGD:
def __init__(self, lr=0.01):
self.lr = lr
# 更新超引數
for key in params.keys():
``````

``````network = nn.layernet()
optimizer = SGD()

for i in range(10000):
x_batch, t_batch = get_batch(..)
# 獲取引數的梯度資訊
# 獲取引數
params = network.params
``````

### Momentum

Momentum是"動量"的意思，是物理學的概念。其數學表示式如下:

``````class Momentum:

"""Momentum SGD"""

def __init__(self, lr=0.01, momentum=0.9):
self.lr = lr
self.momentum = momentum
self.v = None
# 更新超引數
if self.v is None:
self.v = {}
for key, val in params.items():
self.v[key] = np.zeros_like(val)

for key in params.keys():
params[key] += self.v[key]
``````

``````class AdaGrad:

def __init__(self, lr=0.01):
self.lr = lr
self.h = None

if self.h is None:
self.h = {}
for key, val in params.items():
self.h[key] = np.zeros_like(val)

for key in params.keys():
params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)
``````

1. 初始化：設 ( t = 0 )，初始化模型引數，學習率，以及超引數 。為每個引數 初始化一階矩估計 和二階矩估計
2. 在第 ( t ) 步，計算目標函數 對引數 的梯度
3. 更新一階矩估計：
4. 更新二階矩估計：
5. 校正一階矩估計和二階矩估計中的偏差：
6. 計算自適應學習率：
7. 使用自適應學習率更新模型引數：
8. ( t = t + 1 )，重複步驟 2-7 直到收斂。

``````class Adam:

def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
self.lr = lr
self.beta1 = beta1
self.beta2 = beta2
self.iter = 0
self.m = None
self.v = None

if self.m is None:
self.m, self.v = {}, {}
for key, val in params.items():
self.m[key] = np.zeros_like(val)
self.v[key] = np.zeros_like(val)

self.iter += 1
lr_t  = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 - self.beta1**self.iter)

for key in params.keys():
self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
self.v[key] += (1 - self.beta2) * (grads[key]**2 - self.v[key])
params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)
``````