Blog from Jie Wang

ConvexOptimization_Introduction

发表于 2020-03-09 | 分类于 ConvexOptimization

Optimization problem

Three basic elements

variables:$\boldsymbol{x}=\left(x{1}, \dots, x{n}\right)$
constraints: $f_{i}: \mathbb{R}^{n} \rightarrow \mathbb{R}, i = 1,…,m$
objective: $f_{0}: \mathbb{R}^{n} \rightarrow \mathbb{R}$
Goal
Find optimal solution $\boldsymbol{x}^{*}$ minimizing $f_{0}$ while satifying constraints.

Data science models

Linear

minimize $f(\boldsymbol{x})$ subject to $\boldsymbol{Ax=z}$
hope $\hat{\boldsymbol{x}}=\boldsymbol{x}^{\natural}$
Bilinear
find $\boldsymbol{x}, \boldsymbol{h}$ subject to $z{i}=\boldsymbol{b}{i}^{} \boldsymbol{h} \boldsymbol{x}^{} \boldsymbol{a}_{i}, 1 \leq i \leq m$
Quadratic
find $z \quad$ subject to $y{r}=\left|\left\langle\boldsymbol{a}{r}, \boldsymbol{z}\right\rangle\right|^{2}, \quad r=1,2, \ldots, m$
low -rank
$\underset{M \in \mathbb{C}^{m \times n}}{\operatorname{minimize}} \quad \operatorname{rank}(\boldsymbol{M}) \quad$ subject to $\quad Y{i, k}=M{i, k}, \quad(i, k) \in \Omega$
deep models
$\operatorname{minimize} \quad f(\boldsymbol{\theta}):=\frac{1}{n} \sum{i=1}^{n} \ell\left(h\left(\boldsymbol{x}{i}, \boldsymbol{\theta}\right), y_{i}\right)$ subject to $\mathcal{R}(\boldsymbol{\theta}) \leq \tau$

Large-scale optimization: classes of optimization problems

Constrained vs. unconstrained

Unconstrained optimization:
- every point is feasible
- focus is on minimizing $f(\boldsymbol{x})$
constrained:
- find constraint set $C$
- minimizing $f(\boldsymbol{x}), \boldsymbol{x} \in C$
- may be difficult to find point $\boldsymbol{x} \in C$
  Convex vs. nonconvex
convex optimization:
- all local optima are global optima
- can be solved in polynomial-time
  deteministic vs. stochastic
stochastic gradient
solvability vs. saclability
Polynomial-time algorithms might be useless in large-scale applications (iteration -)
- example: Newton’s method
- a single iteration may last forever (time +)
- prohibitive storage requirement (space +)
Iteration complexity vs. per-iteration cost
- computational cost = iteration complexity (#iterations) $\times$ cost per iteration
Large-scale problems call for methods with ==cheap iterations==
- First-order methods: exploit only information on function values and (sub)gradients without Hessian information
  - cheap iterations
  - low memory requirements
- Randomized and approximation methods: project data into subspace, and ==solve reduced dimension problem==
  - optimization for high-dimensional data analysis: polynomial-time algorithms often not fast enough: ==further approximations== are essential
- Adavanced large-scale optimization
  - randomized methods; nonconvex optimization on manifold; learning to optimize; deep reinforcement learning
  - goal: scalable, real-time, parallel, distributed, automatic, etc.

High-dimensional statistics

Convex geometry

Local geometry

Global geometry

Topics and grading

Theoretical foundations

convex sets, convex functions, convex problems, lagrange duality and KKT conditions, disciplined convex programming

first-order methods

gradient methods, subgradient methods, proximal methods

second-order methods

Newton methods, interior-point methods, quasi-Newton methods

stochastic methods

stochastic gradient methods, stochastic Newton methods, randomized sketching methods, randomized linear algebra

machine learning approaches

applications