% Implicit Layers % JJ Vie % 11/12/21 — theme: metropolis header-includes:
You can now backpropagate in any DAG.
But can you differentiate any layer?
\pause
Some layers may call solvers that take several \alert{iterations}
(e.g., root finding, fixed point, ODE solver, numerical integration)
\vspace{2mm}
Find $z \in \R^n$ such that $f(a, z) = 0$ ($a \in \R^p$ are parameters)
\centering
{width=7cm}
\vspace{-5mm}
\[\begin{array}{ccc} & h_{t + 1} = h_t + f(h_t, \theta_t) & \frac{d h(t)}{dt} = f(h(t), t, \theta)\\ \textnormal{Input:} & x = h_0 & x = h(0)\\ \textnormal{Output:} & y = h_T & y = h(T) \end{array}\]Let $f : \R^p \times \R^n \to \R^n$ and $a_0 \in \R^p, z_0 \in \R^n$ s.t.
Then there exists open sets $V_a \in \V(a_0)$ and $V_z \in \V(z_0)$ and a $\alert{unique}$ continuous function $\alert{z^*} : V_a \to V_z$ (“solution mapping”) s.t.
Differentiate w.r.t. $a$ and evaluate at $(a_0, z_0)$
\begin{align}
\partial_a f(a_0, z_0) + \partial_z f(a_0, z_0) \alert{\partial z^}(a_0) = 0
\alert{\partial z^}(a_0) = - \partial_z f(a_0, z_0)^{-1} \partial_a f(a_0, z_0)
\end{align}
Main result: Computing the gradient of unknown function $z^*$ only depends on derivatives of $f$
\centering
{width=50%}
\raggedright
What is the shape of the path that minimizes the time for a marble to slip without friction (accelerated by gravity)?
Input: $P = (P_X, P_Y) \in \R^2$
Find path that minimizes \(T = \int_0^P \frac{ds}v\)
$ds = \sqrt{dx^2 + dy^2}$, $\frac12 mv^2 = mgy$, $y = y(x)$
Find path that minimizes \(T = \int_0^{P_x} \underbrace{\sqrt{\frac{1 + y'^2}{2gy}}}_{L(x, y, y')} dx \qquad \underbrace{\partial_y L = \frac{d}{dx} \partial_{y'} L}_{\textnormal{Euler-Lagrange}}\)
$\Rightarrow y(1 + y’^2) = 2 R$ for some parameter $R$ \qquad $y’^2 = \frac{2R}y - 1$
Find parameter $R$
that minimizes loss $ | y_R(P_X) - P_Y | ^2$ |
such that $y_R$ is a solution of $y(1 + y’^2) = 2R$
from jax import grad
parameters -= GAMMA * grad(loss)(parameters)
Kolter, Duvenaud and Johnson, Deep Implicit Layers, Tutorial NeurIPS 2020, \url{https://implicit-layers-tutorial.org}
Blondel et al., Efficient and Modular Implicit Differentiation, \url{https://jaxopt.github.io}
Chen et al., Neural Ordinary Differential Equations, Best Paper NeurIPS 2018, \url{https://arxiv.org/abs/1806.07366}
\vspace{2mm}
Wibisono, Wilson and Jordan, A Variational Perspective on Accelerated Methods in Optimization, 2016
Even et al., A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip, NeurIPS 2021, \url{https://arxiv.org/abs/2106.07644}
(xccelerated linear algebra) ↩