azula.plugins.eldm¶

Elucidated latent diffusion model (ELDM or EDM2) plugin.

This plugin depends on the torch_utils and training modules in the NVlabs/edm2 repository. To use it, clone the repository to your machine

git clone https://github.com/NVlabs/edm2

and add it to your Python path before importing the plugin.

import sys; sys.path.append("path/to/edm2")
...
from azula.plugins import eldm

You may also need to install additional dependencies, including diffusers and accelerate.

pip install diffusers accelerate

References

Analyzing and Improving the Training Dynamics of Diffusion Models (Karras et al., 2024)

https://arxiv.org/abs/2312.02696

Classes¶

`AutoEncoder`	Creates a standardized auto-encoder.
`ElucidatedLatentDenoiser`	Creates an elucidated latent denoiser.

Functions¶

`list_models`	Returns the list of available pre-trained models.
`load_model`	Loads a pre-trained ELDM (or EDM2) latent denoiser.

Descriptions¶

class azula.plugins.eldm.AutoEncoder(vae, shift, scale)¶

Creates a standardized auto-encoder.

Parameters:

vae (Module) – A (variational) auto-encoder.
shift (Tensor) – The shift to apply to latents, with shape \((C, 1, 1)\).
scale (Tensor) – The scale to apply to latents, with shape \((C, 1, 1)\).

encode(x)¶

Encodes images to latents.

Parameters:: x (Tensor) – A batch of images \(x\), with shape \((B, 3, 512, 512)\). Pixel values are expected to range between 0 and 1.
Returns:: A batch of latents \(z \sim q(Z \mid x)\), with shape \((B, 4, 64, 64)\).
Return type:: Tensor

decode(z)¶

Decodes latents to images.

Parameters:: z (Tensor) – A batch of latents \(z\), with shape \((B, 4, 64, 64)\).
Returns:: A batch of images \(x = d(z)\), with shape \((B, 3, 512, 512)\).
Return type:: Tensor

class azula.plugins.eldm.ElucidatedLatentDenoiser(backbone, schedule=None)¶

Creates an elucidated latent denoiser.

\[\begin{split}\mu_\phi(x_t \mid c) & = (1 - \omega) \, b_\phi(x_t, \sigma_t) + \omega \, b_\phi(x_t, \sigma_t \mid c) \\ \sigma^2_\phi(x_t \mid c) & = \frac{\sigma_t^2}{1 + \sigma_t^2}\end{split}\]

where \(\omega \in \mathbb{R}_+\) is the classifier-free guidance strength.

Parameters:

backbone (Module) – A noise conditional network \(b_\phi(x_t, \sigma_t \mid c)\).
schedule (Schedule) – A variance exploding (VE) schedule. If None, use azula.plugins.edm.ElucidatedSchedule instead.

forward(x_t, t, label=None, omega=None, **kwargs)¶

Parameters:

x_t (Tensor) – A noisy vector \(x_t\), with shape \((*, D)\).
t (Tensor) – The time \(t\), with shape \((*)\).
label (Tensor | None) – The class label \(c\) as a one-hot vector.
omega (Tensor | None) – The classifier-free guidance strength \(\omega \in \mathbb{R}\). If None, classifier-free guidance is not applied.
kwargs – Optional keyword arguments.

Returns:

The Gaussian \(\mathcal{N}(X \mid \mu_\phi(x_t \mid c), \Sigma_\phi(x_t \mid c))\).

Return type:

Gaussian

azula.plugins.eldm.list_models()¶

Returns the list of available pre-trained models.

azula.plugins.eldm.load_model(key)¶

Loads a pre-trained ELDM (or EDM2) latent denoiser.

Parameters:: key (str) – The pre-trained model key.
Returns:: A pre-trained latent denoiser and the corresponding auto-encoder.
Return type:: Tuple[GaussianDenoiser, AutoEncoder]