giagrad.nn.LayerNorm#
- class giagrad.nn.LayerNorm(*args, **kwargs)[source]#
Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The mean and standard-deviation are calculated over the last
dimensions
dimensions. For example, ifdimensions
is2
(a 2-dimensional shape), the mean and standard-deviation are computed over the last 2 dimensions of the input tensor (i.e.input.mean((-2, -1))
). \(\gamma\) and \(\beta\) are learnable affine transform parameters ifelementwise_affine
isTrue
. The standard-deviation is calculated with zero degrees of freedom, equivalent toTensor.var(ddof=0)
.Note
Unlike Batch Normalization and Instance Normalization, which applies scalar scale (\(\gamma\)) and bias (\(\beta\)) for each entire channel/plane with the
affine
option, Layer Normalization applies per-element scale and bias withelementwise_affine
.This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters:
dimensions¶ (int) – Last dimensions where normalization will be computed.
eps¶ (float, default: 1e-5) – A value added to the denominator for numerical stability.
elementwise_affine¶ (boolean, default:
True
) – A boolean value that when set toTrue
, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases).
- Variables:
gamma – The learnable weights of the module of shape \(\text{input.shape}[-dimensions:]\) when
elementwise_affine
is set toTrue
. The values are initialized to 1.beta – The learnable weights of the module of shape \(\text{input.shape}[-dimensions:]\) when
elementwise_affine
is set toTrue
. The values are initialized to 0.
Examples
NLP Example
>>> batch, sentence_length, embedding_dim = 20, 5, 10 >>> embedding = Tensor.empty(batch, sentence_length, embedding_dim).uniform() >>> layer_norm = nn.LayerNorm(dimensions=1) >>> # Activate module >>> layer_norm(embedding)
Image Example
>>> N, C, H, W = 20, 5, 10, 10 >>> input = torch.randn(N, C, H, W) >>> # Normalize over the last three dimensions (i.e. the channel and spatial dimensions) >>> # as shown in the image below >>> layer_norm = nn.LayerNorm(dimensions=3) >>> output = layer_norm(input)