<aside> 📲
实数标量函数、复数标量函数
实值向量函数、复值向量函数
实值张量函数、复值张量函数
实参量凸优化问题的微扰、复参量凸优化问题的微扰
</aside>
<aside>
$$ Df(x)=\frac d{dx}\otimes f=\frac{d f(x)}{dx} $$
$$ Df(\vec x)=(\partial_{x_1},\partial_{x_2},\cdots,\partial_{x_m})\otimes f(\vec x)= \left(\nabla f(\vec x)\right)^T,\vec x\in\mathbb R^m $$
$$ Df(\bm A)= \begin{pmatrix} \partial_{ij}\\ \end{pmatrix}{ij}\otimes f(\bm A)= \begin{pmatrix} \partial{ij}f(\bm A)\\ \end{pmatrix}_{ij},\bm A\in\mathbb R^{m\times n} $$
$$ D\vec f(\vec x)= (\partial_{x_1},\partial_{x_2},\cdots,\partial_{x_m})\otimes \vec f(\vec x)= \begin{pmatrix} \partial_j f_i\\ \end{pmatrix}_{ij}= \begin{pmatrix} \nabla f_i\\ \end{pmatrix}^T=\bm J\left(\vec f\right)
$$
Fréchet derivate https://en.wikipedia.org/wiki/Fréchet_derivative
$$ \lim_{\|h\|{V} \to 0} \frac{\| f(x+h) - f(x) - A h \|{W}}{\|h\|_{V}} = 0. $$
Gateaux derivatehttps://en.wikipedia.org/wiki/Gateaux_derivative
$$ g(v) = \lim_{t \to 0} \frac{f(x + t v) - f(x)}{t}. $$
commutation matrix https://en.wikipedia.org/wiki/Commutation_matrix
$$ \bm K^{(m,n)}\operatorname{vec}(\bm A) = \operatorname{vec}(\bm A^T) . $$
$$ \operatorname{vec}(\bm A)= (a_{i+m(j-1)})_{mn\times 1},\bm A\in V^{m\times n} $$
permutation $P_\pi,\pi(i+m(j-1))=j+n(i-1)$
$$ \operatorname{vec}(\bm ABC)= (\bm C^T\otimes \bm A)\operatorname{vec}(B) $$
→ $\bm K^{(r,m)}(A\otimes B)\operatorname{vec} X= B\otimes AK^{(q,n)}\operatorname{vec} X$
$$ \bm K^{(r,m)}(A\otimes B)K^{(n,q)}= B\otimes A,A\in V^{m\times n} ,B\in V^{r\times q}\\ \bm K^{(r,m)}(A\otimes B)= B\otimes AK^{(q,n)} $$
$$ DF(A)= \begin{pmatrix} \partial_{x_{ij}}\\ \end{pmatrix}{ij} \otimes F= \frac{\partial F{pq}}{\partial A_{ij}} $$
</aside>
https://en.wikipedia.org/wiki/Dual_number#Automatic_differentiation
$$ f(a+b\varepsilon )=\sum _{n=0}^{\infty }{\frac {f^{(n)}(a)b^{n}\varepsilon ^{n}}{n!}}=f(a)+bf'(a)\varepsilon $$
故只需要将函数构造过程中每个操作用 dual tensor 描述,最终完成函数构造时即可写出函数的导数。https://docs.pytorch.org/tutorials/intermediate/forward_ad_usage.html#usage-with-modules
这种方案需要占据两倍的权重内存空间,适合输入维度少(变量占据内存小)输出维度大的函数。
register the jvp() static method for custom autograd Function
$$ \bm J \vec v $$
<aside> 📲
A similar method works for polynomials of n variables, using the Exterior Algebra https://en.wikipedia.org/wiki/Exterior_algebra of an n-dimensional vector space. Grassman Algebra $Gr(n,\mathbb C)$ → dual number的扩展,Grassmann Algebra | https://en.wikipedia.org/wiki/Quadratic_algebra
</aside>
vjp()
$$ g(\vec v)=\vec v \bm J $$
这样Jacobian的链式法则变为g的链式法则,而g作为向量值函数可以不实际计算Jacbian(而可以采用公式)
<aside>
优化问题的微分
derivative of the solution map of a convex cone program, when it exists.
given a perturbation to the cone program coefficients, and computing the gradient of a function of the solution with respect to the coefficients.
#coefficients in millions $\sim 10^6$ </aside>
residual map
齐次自对偶问题
for Linear Programming
Ye 等 - 1994 - An O(√nL)-Iteration Homogeneous and Self-Dual Linear Programming Algorithm.pdf
for cone programming
homogeneous
self-dual / symmetric
embedding