azula.plugins.sana¶
Sana plugin.
This plugin depends on diffusers and transformers. To use it,
install the dependencies in your environment
pip install diffusers transformers accelerate
before importing the plugin.
from azula.plugins import sana
References
Classes¶
Creates an auto-encoder wrapper. |
|
Creates a text encoder. |
|
Creates a Sana denoiser. |
Functions¶
Loads a pre-trained Sana latent denoiser. |
Descriptions¶
- class azula.plugins.sana.AutoEncoder(ae, scale=1.0)¶[source]
Creates an auto-encoder wrapper.
- class azula.plugins.sana.TextEncoder(gemma, tokenizer, max_length=300)¶[source]
Creates a text encoder.
- forward(prompt, instructions=["Given a user prompt, generate an 'Enhanced prompt' that provides detailed visual descriptions suitable for image generation. Evaluate the level of detail in the user prompt:", '- If the prompt is simple, focus on adding specifics about colors, shapes, sizes, textures, and spatial relationships to create vivid and concrete scenes.', '- If the prompt is already detailed, refine and enhance the existing details slightly without overcomplicating.', 'Here are examples of how to transform or refine prompts:', '- User Prompt: A cat sleeping -> Enhanced: A small, fluffy white cat curled up in a round shape, sleeping peacefully on a warm sunny windowsill, surrounded by pots of blooming red flowers.', '- User Prompt: A busy city street -> Enhanced: A bustling city street scene at dusk, featuring glowing street lamps, a diverse crowd of people in colorful clothing, and a double-decker bus passing by towering glass skyscrapers.', 'Please generate only the enhanced description for the prompt below and avoid including any additional commentary or evaluations:', 'User Prompt: '])¶[source]
- class azula.plugins.sana.SanaDenoiser(backbone, schedule=None)¶[source]
Creates a Sana denoiser.
- Parameters:
backbone (Module) – A time conditional network.
schedule (Schedule) – A noise schedule. If
None, useazula.noise.DecayScheduleinstead.
- forward(z_t, t, prompt_embeds, prompt_mask, **kwargs)¶[source]
- Parameters:
z_t (Tensor) – A noisy tensor \(z_t\), with shape \((B, C, H, W)\).
t (Tensor) – The time \(t\), with shape \(()\) or \((B)\).
prompt_embeds (Tensor) – The Gemma-encoded text prompt \(y\), with shape \((B, L, D)\).
prompt_mask (Tensor) – The text attention mask, with shape \((B, L)\).
kwargs – Optional keyword arguments.
- Returns:
The Gaussian \(\mathcal{N}(Z \mid \mu_\phi(z_t \mid y), \Sigma_\phi(z_t \mid y)\).
- Return type:
- azula.plugins.sana.load_model(name, **kwargs)¶[source]
Loads a pre-trained Sana latent denoiser.
- Parameters:
name (str) – The pre-trained model name.
kwargs – Keyword arguments passed to
diffusers.SanaPipeline.from_pretrained.
- Returns:
A pre-trained latent denoiser and the corresponding auto-encoder and text encoder.
- Return type: