giagrad.Adadelta#

class giagrad.Adadelta(params, lr=1.0, rho=0.0, eps=1e-06, weight_decay=0.0, maximize=False)[source]#

Implements Adadelta algorithm.

Based on PyTorch Adadelta.

\[\begin{split}\begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \rho \text{ (decay)}, \: \lambda \text{ (weight decay)} \\ &\rule{110mm}{0.4pt} \\ &\textbf{initialize} : v_0 \leftarrow 0 \: \text{ (square avg)}, \: u_0 \leftarrow 0 \: \text{ (accumulate variables)} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}if \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm} v_t \leftarrow v_{t-1} \rho + g^2_t (1 - \rho) \\ &\hspace{5mm}\Delta x_t \leftarrow \frac{\sqrt{u_{t-1} + \epsilon }}{ \sqrt{v_t + \epsilon} }g_t \hspace{21mm} \\ &\hspace{5mm} u_t \leftarrow u_{t-1} \rho + \Delta x^2_t (1 - \rho) \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1} - \gamma \Delta x_t \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\bf{return} \: \theta_t \\[-1.ex] &\rule{110mm}{0.4pt} \\ \end{aligned}\end{split}\]

For further details regarding the algorithm we refer to ADADELTA: An Adaptive Learning Rate Method.

Parameters:
  • params (iterable of Tensor) – Iterable parameters to optimize.

  • rho (float, default: 0.9) – Coefficient used for computing a running average of squared gradients.

  • eps (float, default: 1e-6) – Term added to the denominator to improve numerical stability.

  • lr (float, default: 1.0) – Coefficient that scale delta before it is applied to the parameters.

  • weight_decay (float, default: 0) – Weight decay (L2 penalty).

  • maximize (bool, default: False) – Maximize the params based on the objective, instead of minimizing.

Examples

>>> optimizer = giagrad.optim.Adadelta(model.parameters())
>>> model.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

Methods

step

Performs a single optimization step/epoch (parameter update).

zero_grad

Sets the gradients of all optimized tensors to zero.