azula.nn.attention¶
Attention layers.
Classes¶
Creates a multi-head self-attention layer. |
Descriptions¶
- class azula.nn.attention.MultiheadSelfAttention(channels, pos_channels=1, attention_heads=1, qkv_bias=True, qk_norm=True, rope=False, dropout=None)¶[source]
Creates a multi-head self-attention layer.
- Parameters:
channels (int) – The number of channels \(H \times C\).
pos_channels (int) – The number of positional channels \(P\). Only necessary with RoPE.
attention_heads (int) – The number of attention heads \(H\).
qkv_bias (bool) – Whether to add bias to the query-key-value projection layer or not.
qk_norm (bool) – Whether to use query-key RMS-normalization or not.
rope (bool) – Whether to use rotary positional embedding (RoPE) or not.
dropout (float | None) – The dropout rate in \([0, 1]\).