giagrad.Adam#
- class giagrad.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, maximize=False, amsgrad=False)[source]#
Implements Adam algorithm.
This algorithm calculates the exponential moving average of gradients and square gradients. And the parameters of β1 and β2 are used to control the decay rates of these moving averages.
Based on PyTorch Adam.
\[\begin{split}\begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \beta_1, \beta_2 \text{ (betas)},\theta_0 \text{ (params)},f(\theta) \text{ (objective)} \\ &\hspace{13mm} \lambda \text{ (weight decay)}, \: \textit{amsgrad}, \:\textit{maximize} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{initialize} : m_0 \leftarrow 0 \text{ ( first moment)}, v_0\leftarrow 0 \text{ (second moment)},\: \widehat{v_0}^{max}\leftarrow 0 \\ &\rule{110mm}{0.4pt} \\[-1.ex] &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}\textbf{if} \: \textit{maximize}: \\ &\hspace{10mm}g_t \leftarrow -\nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{else} \\ &\hspace{10mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{if} \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm}m_t \leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t \\ &\hspace{5mm}v_t \leftarrow \beta_2 v_{t-1} + (1-\beta_2) g^2_t \\ &\hspace{5mm}\widehat{m_t} \leftarrow m_t/\big(1-\beta_1^t \big) \\ &\hspace{5mm}\widehat{v_t} \leftarrow v_t/\big(1-\beta_2^t \big) \\ &\hspace{5mm}\textbf{if} \: amsgrad \\ &\hspace{10mm}\widehat{v_t}^{max} \leftarrow \mathrm{max}(\widehat{v_t}^{max}, \widehat{v_t}) \\ &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} - \gamma \widehat{m_t}/ \big(\sqrt{\widehat{v_t}^{max}} + \epsilon \big) \\ &\hspace{5mm}\textbf{else} \\ &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} - \gamma \widehat{m_t}/ \big(\sqrt{\widehat{v_t}} + \epsilon \big) \\ &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\ &\rule{110mm}{0.4pt} \\[-1.ex] \end{aligned}\end{split}\]- Variables:
params (iterable of Tensor) – Iterable of parameters to optimize.
lr (float, default: 0.001) – Learning rate.
betas (Tuple[float,float], default: (0.9,0.999)) – Betas.
eps (float, default: 1e-8) – Epsilon value.
weight_decay (float, default: 0) – Weight decay (L2 penalty).
maximize (bool, default: False) – Maximize the params based on the objective, instead of minimizing.
amsgrad (bool, default: False) – Option to use the AMSGrad variant of this algorithm.
Examples
>>> optimizer = giagrad.optim.Adam(model.parameters(), lr=0.1, amsgrad=True) >>> model.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
Methods
Performs a single optimization step/epoch (parameter update).
Sets the gradients of all optimized tensors to zero.