Optimization problem
Three basic elements
variables:
$\boldsymbol{x}=\left(x{1}, \dots, x{n}\right)$constraints:
$f_{i}: \mathbb{R}^{n} \rightarrow \mathbb{R}, i = 1,…,m$objective:
$f_{0}: \mathbb{R}^{n} \rightarrow \mathbb{R}$Goal
Find optimal solution $\boldsymbol{x}^{*}$ minimizing $f_{0}$ while satifying constraints.
Data science models
Linear
- minimize $f(\boldsymbol{x})$ subject to $\boldsymbol{Ax=z}$
- hope $\hat{\boldsymbol{x}}=\boldsymbol{x}^{\natural}$
Bilinear
- find $\boldsymbol{x}, \boldsymbol{h}$ subject to $z{i}=\boldsymbol{b}{i}^{} \boldsymbol{h} \boldsymbol{x}^{} \boldsymbol{a}_{i}, 1 \leq i \leq m$
Quadratic
- find $z \quad$ subject to $y{r}=\left|\left\langle\boldsymbol{a}{r}, \boldsymbol{z}\right\rangle\right|^{2}, \quad r=1,2, \ldots, m$
low -rank
- $\underset{M \in \mathbb{C}^{m \times n}}{\operatorname{minimize}} \quad \operatorname{rank}(\boldsymbol{M}) \quad$ subject to $\quad Y{i, k}=M{i, k}, \quad(i, k) \in \Omega$
deep models
- $\operatorname{minimize} \quad f(\boldsymbol{\theta}):=\frac{1}{n} \sum{i=1}^{n} \ell\left(h\left(\boldsymbol{x}{i}, \boldsymbol{\theta}\right), y_{i}\right)$ subject to $\mathcal{R}(\boldsymbol{\theta}) \leq \tau$
Large-scale optimization: classes of optimization problems
Constrained vs. unconstrained
- Unconstrained optimization:
- every point is feasible
- focus is on minimizing $f(\boldsymbol{x})$
- constrained:
- convex optimization:
- stochastic gradient
solvability vs. saclability
- Polynomial-time algorithms might be useless in large-scale applications (iteration -)
- example: Newton’s method
- a single iteration may last forever (time +)
- prohibitive storage requirement (space +)
- Iteration complexity vs. per-iteration cost
- computational cost = iteration complexity (#iterations) $\times$ cost per iteration
- Large-scale problems call for methods with ==cheap iterations==
- First-order methods: exploit only information on
function values
and(sub)gradients
without Hessian information- cheap iterations
- low memory requirements
- Randomized and approximation methods: project data into subspace, and ==solve reduced dimension problem==
- optimization for high-dimensional data analysis: polynomial-time algorithms often not fast enough: ==further approximations== are essential
- Adavanced large-scale optimization
- randomized methods; nonconvex optimization on manifold; learning to optimize; deep reinforcement learning
- goal: scalable, real-time, parallel, distributed, automatic, etc.
- First-order methods: exploit only information on
High-dimensional statistics
Convex geometry
Local geometry
Global geometry
Topics and grading
Theoretical foundations
convex sets, convex functions, convex problems, lagrange duality and KKT conditions, disciplined convex programming
first-order methods
gradient methods, subgradient methods, proximal methods
second-order methods
Newton methods, interior-point methods, quasi-Newton methods
stochastic methods
stochastic gradient methods, stochastic Newton methods, randomized sketching methods, randomized linear algebra