azula.nn.attention¶

Attention layers.

Classes¶

Creates a multi-head self-attention layer.

class azula.nn.attention.MultiheadSelfAttention(channels, attention_heads=1, qk_norm=True, dropout=None, checkpointing=False)¶

Creates a multi-head self-attention layer.

Parameters:

forward(x, theta=None, mask=None)¶

Parameters:

x (Tensor) – The input tokens \(x\), with shape \((*, L, H \times C)\).
theta (Tensor | None) – Optional rotary positional embedding \(\theta\), with shape \((*, L, H \times C / 2)\).
mask (Tensor | None) – Optional attention mask, with shape \((L, L)\).

Returns:

The ouput tokens \(y\), with shape \((*, L, H \times C)\).

Return type:

Tensor