Second-order MeanFlow: Combining Forward and Backward Distillation
In this blog, We show how the MeanFlow (backward distillation) identity and the tri-consistency constraint can be combined to produce a second-order identity for the student velocity in diffusion/flow distillation. Similar to MeanFlow, this 2-order MeanFlow contains remarkably simple and elegant target: the average of two end point velocities and a second-order correction term.
1. Setup: Notation and Goals
- $x_t$ denotes the state at time $t$ on the diffusion/flow trajectory.
- $v_\phi(x_t,t)$ is the pre-trained teacher velocity field (teacher’s instantaneous velocity at $(x_t,t)$).
- $v_\theta(x_t,t,s)$ is the student average velocity intended to take a sample at time $t$ to the time $s$ in a single step (a short-cut/one-step generator). We treat $s > t$.
- We will often use small increments $ds$ with $s_2 = s_1 + ds$.
Two consistency principles underly the derivation:
-
MeanFlow (backward) identity (backward distillation): for $s > t$,
$$ v(x_t,t,s) = v_\phi(x_t,t) + (s-t)\frac{d}{dt}\bigl[v(x_t,t,s)\bigr], $$which expresses the student average velocity from timestep $t$ to $s$ in terms of the (teacher’s) instantaneous velocity at $t$ and the time derivative of the student velocity with respect to the local timestep $t$.
-
Tri-consistency (additivity of short segments): for $s_2 > s_1 > t$,
$$ (s_1-t)\,v(x_t,t,s_1) + (s_2-s_1)\,v(x_{s_1},s_1,s_2) \;=\; (s_2-t)\,v(x_t,t,s_2). $$This says that two short forwards ($t$ to $s_1$ to $s_2$) should equal the single forward (from $t$ to $s_2$).
Goal: derive a practical target for $v_\theta(x_t,t,s)$ that consider both the forward and backward constraints.
2. A Intuitive Derivation
Start from tri-consistency and replace the second segment ($s_1$ to $s_2$) with teacher PF-ODE simulation:
For small step $ds = s_2 - s_1$ the one-step Euler approximation to the teacher ODE gives
Now set $s_1 = s$ and $s_2 = s + ds$. Substitute the MeanFlow identity (backward) for the two student velocities $v(x_t,t,s)$ and $v(x_t,t,s_2)$. After algebra and cancellation of terms proportional to $ds$, we arrive at the following mixed relation:
We can substitute the MeanFlow identity to eliminate the first time derivative and rearrange to isolate $v(x_t,t,s)$, yields the second-order MeanFlow identity:
Remarks
- Equation (★) is the second-order MeanFlow expression: the student velocity equals the average of the two teacher velocities plus a second-order correction involving the mixed partial $\dfrac{d^2}{dt\,ds}v$.
- (Quick validation) Similar to MeanFlow, when $t=s$, Equation (★) degrades to flow matching loss.
- Unlike forward and backward distillation loss, Equation (★) can not be simply interpreted as a special case of tri-consistency.
- In practive, we expect to use this loss along with the first-order MeanFlow loss to improve the few-step performance (requires more computation).
2. Generalized 2-order Loss
Given the backward and forward distillation losses:
we can combine them with our second-order MeanFlow loss using weights $\alpha$ and $\beta$ to get a generalized loss:
For $\beta = 0$, $\alpha = 1$ or $0$, we recover backward or forward distillation loss; for $\beta = 1$, $\alpha = \frac{1}{2}$, we recover the second-order MeanFlow loss (★).
References
[1] Mean Flows for One-Step Generative Modeling.
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. arXiv preprint, arXiv:2505.13447, 2025.
[2] ICML Tutorial on the Blessing of Flow.
Qiang Liu. International Conference on Machine Learning (ICML), 2025.