ConvexOptimization_Introduction

Optimization problem

Three basic elements

  • variables:$\boldsymbol{x}=\left(x{1}, \dots, x{n}\right)$
  • constraints: $f_{i}: \mathbb{R}^{n} \rightarrow \mathbb{R}, i = 1,…,m$
  • objective: $f_{0}: \mathbb{R}^{n} \rightarrow \mathbb{R}$

    Goal

    Find optimal solution $\boldsymbol{x}^{*}$ minimizing $f_{0}$ while satifying constraints.

Data science models

Linear

  • minimize $f(\boldsymbol{x})$ subject to $\boldsymbol{Ax=z}$
  • hope $\hat{\boldsymbol{x}}=\boldsymbol{x}^{\natural}$

    Bilinear

  • find $\boldsymbol{x}, \boldsymbol{h}$ subject to $z{i}=\boldsymbol{b}{i}^{} \boldsymbol{h} \boldsymbol{x}^{} \boldsymbol{a}_{i}, 1 \leq i \leq m$

    Quadratic

  • find $z \quad$ subject to $y{r}=\left|\left\langle\boldsymbol{a}{r}, \boldsymbol{z}\right\rangle\right|^{2}, \quad r=1,2, \ldots, m$

    low -rank

  • $\underset{M \in \mathbb{C}^{m \times n}}{\operatorname{minimize}} \quad \operatorname{rank}(\boldsymbol{M}) \quad$ subject to $\quad Y{i, k}=M{i, k}, \quad(i, k) \in \Omega$

    deep models

  • $\operatorname{minimize} \quad f(\boldsymbol{\theta}):=\frac{1}{n} \sum{i=1}^{n} \ell\left(h\left(\boldsymbol{x}{i}, \boldsymbol{\theta}\right), y_{i}\right)$ subject to $\mathcal{R}(\boldsymbol{\theta}) \leq \tau$

Large-scale optimization: classes of optimization problems

Constrained vs. unconstrained

  • Unconstrained optimization:
    • every point is feasible
    • focus is on minimizing $f(\boldsymbol{x})$
  • constrained:
    • find constraint set $C$
    • minimizing $f(\boldsymbol{x}), \boldsymbol{x} \in C$
    • may be difficult to find point $\boldsymbol{x} \in C$

      Convex vs. nonconvex

  • convex optimization:
    • all local optima are global optima
    • can be solved in polynomial-time

      deteministic vs. stochastic

  • stochastic gradient

    solvability vs. saclability

  • Polynomial-time algorithms might be useless in large-scale applications (iteration -)
    • example: Newton’s method
    • a single iteration may last forever (time +)
    • prohibitive storage requirement (space +)
  • Iteration complexity vs. per-iteration cost
    • computational cost = iteration complexity (#iterations) $\times$ cost per iteration
  • Large-scale problems call for methods with ==cheap iterations==
    • First-order methods: exploit only information on function values and (sub)gradients without Hessian information
      • cheap iterations
      • low memory requirements
    • Randomized and approximation methods: project data into subspace, and ==solve reduced dimension problem==
      • optimization for high-dimensional data analysis: polynomial-time algorithms often not fast enough: ==further approximations== are essential
    • Adavanced large-scale optimization
      • randomized methods; nonconvex optimization on manifold; learning to optimize; deep reinforcement learning
      • goal: scalable, real-time, parallel, distributed, automatic, etc.

High-dimensional statistics

Convex geometry

Local geometry

Global geometry

Topics and grading

Theoretical foundations

convex sets, convex functions, convex problems, lagrange duality and KKT conditions, disciplined convex programming

first-order methods

gradient methods, subgradient methods, proximal methods

second-order methods

Newton methods, interior-point methods, quasi-Newton methods

stochastic methods

stochastic gradient methods, stochastic Newton methods, randomized sketching methods, randomized linear algebra

machine learning approaches

applications