azula.nn.layers¶

Common layers.

Classes¶

`ReLU2`	Creates a ReLU² activation layer.
`SwiGLU`	Creates a SwiGLU activation layer.
`LayerNorm`	Creates a layer that standardizes features along a dimension.
`RMSNorm`	Creates a layer that normalizes features along a dimension.

Functions¶

`ConvNd`	Returns an N-dimensional convolutional layer.
`Patchify`	Returns a patch-to-channel layer.
`Unpatchify`	Returns a channel-to-patch layer.

Descriptions¶

azula.nn.layers.ConvNd(in_channels, out_channels, spatial=2, identity_init=False, **kwargs)¶

Returns an N-dimensional convolutional layer.

Parameters:

in_channels (int) – The number of input channels \(C_i\).
out_channels (int) – The number of output channels \(C_o\).
spatial (int) – The number of spatial dimensions \(N\).
identity_init (bool) – Initialize the convolution as a (pseudo-)identity.
kwargs – Keyword arguments passed to torch.nn.Conv2d.

class azula.nn.layers.ReLU2(*args, **kwargs)¶

Creates a ReLU² activation layer.

\[y = \max(x, 0)^2\]

References

Primer: Searching for Efficient Transformers for Language Modeling (So et al., 2021)

https://arxiv.org/abs/2109.08668

class azula.nn.layers.SwiGLU(*args, **kwargs)¶

Creates a SwiGLU activation layer.

\[y = x_1 \times x_2 \times \sigma(x_2)\]

References

GLU Variants Improve Transformer (Shazeer, 2020)

https://arxiv.org/abs/2002.05202

forward(x)¶

Parameters:: x (Tensor) – The input tensor \(x\), with shape \((*, 2C)\).
Returns:: The output tensor \(y\), with shape \((*, C)\).
Return type:: Tensor

class azula.nn.layers.LayerNorm(dim, eps=1e-05)¶

Creates a layer that standardizes features along a dimension.

\[y = \frac{x - \mathbb{E}[x]}{\sqrt{\mathbb{V}[x] + \epsilon}}\]

References

Layer Normalization (Lei Ba et al., 2016)

https://arxiv.org/abs/1607.06450

Parameters:

dim (int | Sequence[int]) – The dimension(s) to standardize.
eps (float) – A numerical stability term.

forward(x)¶

Parameters:: x (Tensor) – The input tensor \(x\), with shape :math:(*).
Returns:: The standardized tensor \(y\), with shape \((*)\).
Return type:: Tensor

class azula.nn.layers.RMSNorm(dim, eps=1e-05)¶

Creates a layer that normalizes features along a dimension.

\[y = \frac{x}{\sqrt{\mathbb{E}[x^2] + \epsilon}}\]

References

Root Mean Square Layer Normalization (Zhang et al., 2019)

https://arxiv.org/abs/1910.07467

Parameters:

dim (int | Sequence[int]) – The dimension(s) to normalize.
eps (float) – A numerical stability term.

forward(x)¶

Parameters:: x (Tensor) – The input tensor \(x\), with shape :math:(*).
Returns:: The normalized tensor \(y\), with shape \((*)\).
Return type:: Tensor

azula.nn.layers.Patchify(patch_shape, channel_last=False)¶

Returns a patch-to-channel layer.

Parameters:

patch_shape (Sequence[int]) – The patch shape.
channel_last (bool) – Whether the output channel dimension is first or last.

azula.nn.layers.Unpatchify(patch_shape, channel_last=False)¶

Returns a channel-to-patch layer.

Parameters:

patch_shape (Sequence[int]) – The patch shape.
channel_last (bool) – Whether the input channel dimension is first or last.