giagrad.Adadelta#
- class giagrad.Adadelta(params, lr=1.0, rho=0.0, eps=1e-06, weight_decay=0.0, maximize=False)[source]#
Implements Adadelta algorithm.
Based on PyTorch Adadelta.
\[\begin{split}\begin{aligned} &\rule{110mm}{0.4pt} \\ &\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)}, \: f(\theta) \text{ (objective)}, \: \rho \text{ (decay)}, \: \lambda \text{ (weight decay)} \\ &\rule{110mm}{0.4pt} \\ &\textbf{initialize} : v_0 \leftarrow 0 \: \text{ (square avg)}, \: u_0 \leftarrow 0 \: \text{ (accumulate variables)} \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\ &\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\ &\hspace{5mm}if \: \lambda \neq 0 \\ &\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\ &\hspace{5mm} v_t \leftarrow v_{t-1} \rho + g^2_t (1 - \rho) \\ &\hspace{5mm}\Delta x_t \leftarrow \frac{\sqrt{u_{t-1} + \epsilon }}{ \sqrt{v_t + \epsilon} }g_t \hspace{21mm} \\ &\hspace{5mm} u_t \leftarrow u_{t-1} \rho + \Delta x^2_t (1 - \rho) \\ &\hspace{5mm}\theta_t \leftarrow \theta_{t-1} - \gamma \Delta x_t \\[-1.ex] &\rule{110mm}{0.4pt} \\ &\bf{return} \: \theta_t \\[-1.ex] &\rule{110mm}{0.4pt} \\ \end{aligned}\end{split}\]For further details regarding the algorithm we refer to ADADELTA: An Adaptive Learning Rate Method.
- Parameters:
params¶ (iterable of Tensor) – Iterable parameters to optimize.
rho¶ (float, default: 0.9) – Coefficient used for computing a running average of squared gradients.
eps¶ (float, default: 1e-6) – Term added to the denominator to improve numerical stability.
lr¶ (float, default: 1.0) – Coefficient that scale delta before it is applied to the parameters.
weight_decay¶ (float, default: 0) – Weight decay (L2 penalty).
maximize¶ (bool, default: False) – Maximize the params based on the objective, instead of minimizing.
Examples
>>> optimizer = giagrad.optim.Adadelta(model.parameters()) >>> model.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step()
Methods
Performs a single optimization step/epoch (parameter update).
Sets the gradients of all optimized tensors to zero.