azula.plugins.eldm

Elucidated latent diffusion model (ELDM or EDM2) plugin.

This plugin depends on the torch_utils and training modules in the NVlabs/edm2 repository. To use it, clone the repository to your machine

git clone https://github.com/NVlabs/edm2

and add it to your Python path before importing the plugin.

import sys; sys.path.append("path/to/edm2")
...
from azula.plugins import eldm

You may also need to install additional dependencies in your environment, including diffusers and accelerate.

pip install diffusers accelerate

References

Analyzing and Improving the Training Dynamics of Diffusion Models (Karras et al., 2024)

Classes

AutoEncoder

Creates an auto-encoder wrapper.

ElucidatedLatentDenoiser

Creates an elucidated latent denoiser.

Functions

load_model

Loads a pre-trained ELDM (or EDM2) latent denoiser.

Descriptions

class azula.plugins.eldm.AutoEncoder(vae, shift, scale)[source]

Creates an auto-encoder wrapper.

encode(x)[source]

Encodes images to latents.

Parameters:

x (Tensor) – A batch of images \(x\), with shape \((B, 3, 512, 512)\). Pixel values are expected to range between 0 and 1.

Returns:

A batch of latents \(z \sim q(Z \mid x)\), with shape \((B, 4, 64, 64)\).

Return type:

Tensor

decode(z)[source]

Decodes latents to images.

Parameters:

z (Tensor) – A batch of latents \(z\), with shape \((B, 4, 64, 64)\).

Returns:

A batch of images \(x = D(z)\), with shape \((B, 3, 512, 512)\).

Return type:

Tensor

class azula.plugins.eldm.ElucidatedLatentDenoiser(backbone, schedule=None)[source]

Creates an elucidated latent denoiser.

Parameters:
forward(z_t, t, label=None, **kwargs)[source]
Parameters:
  • z_t (Tensor) – A noisy tensor \(z_t\), with shape \((B, 4, 64, 64)\).

  • t (Tensor) – The time \(t\), with shape \(()\) or \((B)\).

  • label (Tensor | None) – The class label \(c\) as a one-hot vector, with shape \((*, 1000)\).

  • kwargs – Optional keyword arguments.

Returns:

The Dirac delta \(\delta(Z - \mu_\phi(z_t \mid c))\).

Return type:

DiracPosterior

azula.plugins.eldm.load_model(name)[source]

Loads a pre-trained ELDM (or EDM2) latent denoiser.

Parameters:

name (str) – The pre-trained model name.

Returns:

A pre-trained latent denoiser and the corresponding auto-encoder.

Return type:

tuple[Denoiser, AutoEncoder]