giagrad.Adamax#
- class giagrad.Adamax(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, maximize=False)[source]#
Implements Adamax algorithm (a variant of Adam based on infinity norm).
Based on PyTorch Adamax.
\[\begin{split}\begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \beta_1, \: \beta_2 \text{ (betas)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \lambda \text{ (weight decay)}, \\ &\hspace{13mm} \:\epsilon \text{ (epsilon)} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{initialize} : m_0 \leftarrow 0 \: \text{(first moment)}, \: u_0 \leftarrow 0 \: \text{(infinity norm)} \\[-1.ex] &\rule{110mm}{0.4pt} \\ \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{if} \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm}m_t \leftarrow \beta_1 m_{t-1} + (1-\beta_1) g_t \\ &\hspace{5mm}u_t \leftarrow \max(\beta_2 u_{t-1}, |g_t| + \epsilon) \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1} - \frac{(1-\beta_1^t)}{u_t} \gamma m_t \\ &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\ &\rule{110mm}{0.4pt} \\[-1.ex] \end{aligned}\end{split}\]- Variables:
params (iterable of Tensor) – Iterable of parameters to optimize.
lr (float, default: 0.001) – Learning rate.
betas (Tuple[float,float], default: (0.9,0.999)) – Betas.
eps (float, default: 1e-8) – Epsilon value.
weight_decay (float, default: 0) – Weight decay (L2 penalty).
maximize (bool, default: False) – Maximize the params based on the objective, instead of minimizing.
Examples
>>> optimizer = giagrad.optim.Adamax(model.parameters()) >>> model.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
Methods
Performs a single optimization step/epoch (parameter update).
Sets the gradients of all optimized tensors to zero.