azula.nn.layers

Common layers.

Classes

LayerNorm

Creates a layer that standardizes features along a dimension.

RMSNorm

Creates a layer that normalizes features along a dimension.

ReLU2

Creates a ReLU² activation layer.

SineEncoding

Creates a sinusoidal positional encoding.

SwiGLU

Creates a SwiGLU activation layer.

Functions

ConvNd

Returns an N-dimensional convolutional layer.

Patchify

Returns a patch-to-channel layer.

Unpatchify

Returns a channel-to-patch layer.

Descriptions

azula.nn.layers.ConvNd(in_channels, out_channels, spatial=2, identity_init=False, **kwargs)[source]

Returns an N-dimensional convolutional layer.

Parameters:
  • in_channels (int) – The number of input channels \(C_i\).

  • out_channels (int) – The number of output channels \(C_o\).

  • spatial (int) – The number of spatial dimensions \(N\).

  • identity_init (bool) – Initialize the convolution as a (pseudo-)identity.

  • kwargs – Keyword arguments passed to torch.nn.Conv2d.

class azula.nn.layers.LayerNorm(dim, eps=1e-05)[source]

Creates a layer that standardizes features along a dimension.

\[y = \frac{x - \mathbb{E}[x]}{\sqrt{\mathbb{V}[x] + \epsilon}}\]

References

Layer Normalization (Lei Ba et al., 2016)
Parameters:
  • dim (int | Sequence[int]) – The dimension(s) to standardize.

  • eps (float) – A numerical stability term.

forward(x)[source]
Parameters:

x (Tensor) – The input tensor \(x\), with shape :math:(*).

Returns:

The standardized tensor \(y\), with shape \((*)\).

Return type:

Tensor

azula.nn.layers.Patchify(patch_shape, channel_last=False)[source]

Returns a patch-to-channel layer.

Parameters:
  • patch_shape (Sequence[int]) – The patch shape.

  • channel_last (bool) – Whether the output channel dimension is first or last.

class azula.nn.layers.RMSNorm(dim, eps=1e-05)[source]

Creates a layer that normalizes features along a dimension.

\[y = \frac{x}{\sqrt{\mathbb{E}[x^2] + \epsilon}}\]

References

Root Mean Square Layer Normalization (Zhang et al., 2019)
Parameters:
  • dim (int | Sequence[int]) – The dimension(s) to normalize.

  • eps (float) – A numerical stability term.

forward(x)[source]
Parameters:

x (Tensor) – The input tensor \(x\), with shape :math:(*).

Returns:

The normalized tensor \(y\), with shape \((*)\).

Return type:

Tensor

class azula.nn.layers.ReLU2(*args, **kwargs)[source]

Creates a ReLU² activation layer.

\[y = \max(x, 0)^2\]

References

Primer: Searching for Efficient Transformers for Language Modeling (So et al., 2021)
class azula.nn.layers.SineEncoding(features, omega=10000.0)[source]

Creates a sinusoidal positional encoding.

\[\begin{split}e_{2i} & = \sin \left( x \times \omega^\frac{-2i}{D} \right) \\ e_{2i+1} & = \cos \left( x \times \omega^\frac{-2i}{D} \right)\end{split}\]

References

Attention Is All You Need (Vaswani et al., 2017)
Parameters:
  • features (int) – The number of embedding features \(D\). Must be even.

  • omega (float) – The maximum frequency \(\omega\).

forward(x)[source]
Parameters:

x (Tensor) – The position \(x\), with shape \((*)\).

Returns:

The embedding vector \(e\), with shape \((*, D)\).

Return type:

Tensor

class azula.nn.layers.SwiGLU(*args, **kwargs)[source]

Creates a SwiGLU activation layer.

\[y = x_1 \times x_2 \times \sigma(x_2)\]

References

GLU Variants Improve Transformer (Shazeer, 2020)
forward(x)[source]
Parameters:

x (Tensor) – The input tensor \(x\), with shape \((*, 2C)\).

Returns:

The output tensor \(y\), with shape \((*, C)\).

Return type:

Tensor

azula.nn.layers.Unpatchify(patch_shape, channel_last=False)[source]

Returns a channel-to-patch layer.

Parameters:
  • patch_shape (Sequence[int]) – The patch shape.

  • channel_last (bool) – Whether the input channel dimension is first or last.