azula.nn.attention¶
Attention layers.
Classes¶
Creates a multi-head self-attention layer. |
Descriptions¶
- class azula.nn.attention.MultiheadSelfAttention(channels, attention_heads=1, qk_norm=True, rpb=False, rope=False, pos_features=None, dropout=None, checkpointing=False)¶
Creates a multi-head self-attention layer.
- Parameters:
channels (int) – The number of channels \(H \times C\).
attention_heads (int) – The number of attention heads \(H\).
qk_norm (bool) – Whether to use query-key RMS-normalization or not.
rpb (bool) – Whether to use relative positional bias (RPB) or not.
rope (bool) – Whether to use rotary positional embedding (RoPE) or not.
pos_features (int | None) – The number of positional features \(P\). Only necessary with RPB and RoPE.
dropout (float | None) – The dropout rate in \([0, 1]\).
checkpointing (bool) – Whether to use gradient checkpointing or not.
- forward(x, theta=None, mask=None)¶
- Parameters:
- Returns:
The ouput tokens \(y\), with shape \((*, L, H \times C)\).
- Return type: