giagrad.SGD#

class giagrad.SGD(params, lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, maximize=False)[source]#

Implements stochastic gradient descent (optionally with momentum).

Based on PyTorch SGD.

\[\begin{split}\begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \lambda \text{ (weight decay)}, \\ &\hspace{13mm} \:\mu \text{ (momentum)}, \:\tau \text{ (dampening)}, \:\textit{ nesterov,}\:\textit{ maximize} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{initialize} : b_0 \leftarrow 0 \: \text{(if momentum)} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}\textbf{if} \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm}\textbf{if} \: \mu \neq 0 \\ &\hspace{10mm}\textbf{if} \: t > 1 \\ &\hspace{15mm} \textbf{b}_t \leftarrow \mu \textbf{b}_{t-1} + (1-\tau) g_t \\ &\hspace{10mm}\textbf{else} \\ &\hspace{15mm} \textbf{b}_t \leftarrow g_t \\ &\hspace{10mm}\textbf{if} \: \textit{nesterov} \\ &\hspace{15mm} g_t \leftarrow g_{t} + \mu \textbf{b}_t \\ &\hspace{10mm}\textbf{else} \\[-1.ex] &\hspace{15mm} g_t \leftarrow \textbf{b}_t \\ &\hspace{5mm}\textbf{if} \: \textit{maximize} \\ &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} + \gamma g_t \\[-1.ex] &\hspace{5mm}\textbf{else} \\[-1.ex] &\hspace{10mm}\theta_t \leftarrow \theta_{t-1} - \gamma g_t \\[-1.ex] &\rule{110mm}{0.4pt} \\[-1.ex] &\bf{return} \: \theta_t \\ &\rule{110mm}{0.4pt} \\ \end{aligned}\end{split}\]

Nesterov momentum is based on the formula from On the importance of initialization and momentum in deep learning.

Variables:
  • params (iterable of Tensor) – Iterable of parameters to optimize.

  • lr (float, default: 0.001) – Learning rate.

  • momentum (float, default: 0) – Momentum factor.

  • weight_decay (float, default: 0) – Weight decay (L2 penalty).

  • dampening (float, default: 0) – Dampening for momentum.

  • nesterov (bool, default: False) – Enables Nesterov momentum.

  • maximize (bool, default: False) – Maximize the params based on the objective, instead of minimizing.

Examples

>>> optimizer = giagrad.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> model.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

Methods

step

Performs a single optimization step/epoch (parameter update).

zero_grad

Sets the gradients of all optimized tensors to zero.