# Section 4.6 Best Approximation and Least Squares

Remark: Given a nonzero vector $\overrightarrow{u}$ in $\mathbb{R}^{n}$ , consider the problem of decomposing a vector $\overrightarrow{y}$ in into the sum of two vectors, one a multiple of $\overrightarrow{u}$ and the other orthogonal to $\overrightarrow{u}$. We wish to write $\overrightarrow{y}=\alpha\overrightarrow{u}+\overrightarrow{z}$ for some scalar $\alpha$ and $\overrightarrow{z}$ is some vector orthogonal to $\overrightarrow{u}$ and $\overrightarrow{y}-\alpha\overrightarrow{u}=\overrightarrow{z}$. $\overrightarrow{y}-\alpha\overrightarrow{u}=\overrightarrow{z}$ is orthogonal to $\overrightarrow{u}$ if an only if $(\overrightarrow{y}-\alpha\overrightarrow{u})\cdot\overrightarrow{u}=0$ if and only if $\overrightarrow{y}\cdot\overrightarrow{u}-\alpha\overrightarrow{u}\cdot\overrightarrow{u}=0$ if and only if $\alpha=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}$ which is the wight of $\overrightarrow{y}$ with respect to $\overrightarrow{u}$ of the linear combination of an orthogonal set $S$ if $\overrightarrow{u}$ is part of $S$. The vector $\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}\overrightarrow{u}$ is called the orthogonal projection of $\overrightarrow{y}$ onto $\overrightarrow{u}$, and the vector $\overrightarrow{y}-\widehat{y}=\overrightarrow{z}$ is called the component of $\overrightarrow{y}$ orthogonal to $\overrightarrow{u}$. Notice $\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}\overrightarrow{u}=\frac{\overrightarrow{y}\cdot(c\overrightarrow{u})}{(c\overrightarrow{u})\cdot(c\overrightarrow{u})}(c\overrightarrow{u})$ when $c\neq0$. Hence $\widehat{y}$ is determined by the subspace $L$ spanned by $\overrightarrow{u}$( the line passing $0$ and $\overrightarrow{u}$). $\widehat{y}$ is denoted by proj$_{L}\overrightarrow{y}$ and is called the orthogonal projection of $\overrightarrow{y}$ onto $L$.

Example 1: Let $\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right]$ and $\overrightarrow{u}=\left[\begin{array}{c} 1\\ -2\\ 3 \end{array}\right]$.

Find the orthogonal projection of $\overrightarrow{y}$ onto $\overrightarrow{u}$.
Then write $\overrightarrow{y}$ as the sum of two orthogonal vectors, one in Span$\{\overrightarrow{u}\}$ and one orthogonal to $\overrightarrow{u}$.

Exercise 1:  Let $\overrightarrow{y}=\left[\begin{array}{c} 2\\ -1\\ 3 \end{array}\right]$ and $\overrightarrow{u}=\left[\begin{array}{c} -3\\ 2\\ 4 \end{array}\right]$. Find the orthogonal projection of $\overrightarrow{y}$ onto $\overrightarrow{u}$. Then write $\overrightarrow{y}$ as the sum of two orthogonal vectors,
one in Span$\{\overrightarrow{u}\}$ and one orthogonal to $\overrightarrow{u}$.

The orthogonal projection of a point in $\mathbb{R}^{2}$ onto a line through the origin has an important analogue in $\mathbb{R}^{n}$. Given a vector $\overrightarrow{y}$ and a subspace $W$ in $\mathbb{R}^{n}$, there is a vector $\widehat{y}$ in $W$ such that $\widehat{y}$ is the unique vector in $W$ for which $\overrightarrow{y}-\widehat{y}$ is orthogonal to $W$, and $\widehat{y}$ is the unique vector in $W$ closest to $\overrightarrow{y}$. These two properties of provide the key to finding the least-squares solutions of linear systems.

Theorem: Let $W$ be a subspace of $\mathbb{R}^{n}$. Then each $\overrightarrow{y}$ in $\mathbb{R}^{n}$ can be written uniquely in the form $\overrightarrow{y}=\widehat{y}+\overrightarrow{z}$ where $\widehat{y}$ is in $W$ and $\overrightarrow{z}$ is orthogonal to $W$. In fact, if $\{\overrightarrow{u_{1}},...,\overrightarrow{u_{p}}\}$ is any orthogonal basis of $W$, then $\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u_{1}}}{\overrightarrow{u_{1}}\cdot\overrightarrow{u_{1}}}\overrightarrow{u_{1}}+...+\frac{\overrightarrow{y}\cdot\overrightarrow{u_{p}}}{\overrightarrow{u_{p}}\cdot\overrightarrow{u_{p}}}\overrightarrow{u_{p}}$ and $\overrightarrow{z}=\overrightarrow{y}-\widehat{y}$.

Definition: The vector $\widehat{y}$ is called the orthogonal projection of $\overrightarrow{y}$ onto $W$ and often is written as $\mbox{proj}_{W}\overrightarrow{y}$.

Proof: Use $\overrightarrow{z}\cdot\overrightarrow{u_{i}}=(\overrightarrow{y}-\widehat{y})\cdot\overrightarrow{u_{i}}=0$}

Example 2: Let

$W=\mbox{Span}\{\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ 1\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 1\\ 1\\ 3 \end{array}\right]\}$.

Notice that $\{\overrightarrow{u_{1}},\overrightarrow{u_{2}}\}$
is an orthogonal basis of $W$. Let $\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right]$. Write $\overrightarrow{y}$ as the sum of a vector in $W$ and a
vector orthogonal to $W$.

Exercise 2: Let

$W=\mbox{Span}\{\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ 1\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} -1\\ 5\\ 3 \end{array}\right]\}$.

Notice that $\{\overrightarrow{u_{1}},\overrightarrow{u_{2}}\}$
is an orthogonal basis of $W$. Let $\overrightarrow{y}=\left[\begin{array}{c} 4\\ 3\\ -4 \end{array}\right]$. Write $\overrightarrow{y}$ as the sum of a vector in $W$ and a
vector orthogonal to $W$.

Remark: If $\{\overrightarrow{u_{1}},...,\overrightarrow{u_{p}}\}$ is an orthogonal basis for $W$ and if $\overrightarrow{y}$ happens to be in $W$, then the formula for $\mbox{proj}_{W}\overrightarrow{y}$ is exactly the same as the representation of $\overrightarrow{y}$ given in above theorem, i.e. $\mbox{proj}_{W}\overrightarrow{y}=\overrightarrow{y}$.

Theorem: Let $W$ be a subspaceof $\mathbb{R}^{n}$, let $\overrightarrow{y}$ be any vector in $\mathbb{R}^{n}$,and let $\widehat{y}$ be the orthogonal projection of $\overrightarrow{y}$ onto $W$. Then $\widehat{y}$ is the closest point in $W$ to $\overrightarrow{y}$, in the sense that $||\overrightarrow{y}-\widehat{y}||<||\overrightarrow{y}-\overrightarrow{v}||$ for all $\overrightarrow{v}$ in $W$ distinct from $\widehat{y}$.

Remark: The vector in $\widehat{y}$ is called the best approximation to $\overrightarrow{y}$ by elements of $W$. The distance from $\overrightarrow{y}$ to $\overrightarrow{v}$, given by $||\overrightarrow{y}-\overrightarrow{v}||$, can be regarded as the \textquotedblleft error\textquotedblright{} of using $\overrightarrow{v}$ in place of $\overrightarrow{y}$. The theorem says that this error is minimized when $\overrightarrow{v}=\widehat{y}$. This theorem also shows that $\widehat{y}$ does not depend on the particular orthogonal basis used to compute it.

Example 3: The distance from a point $\overrightarrow{y}$ in $\mathbb{R}^{n}$ to a subspace $W$ is defined as the distance from $\overrightarrow{y}$ to the nearest point in $W$. Find the
distance from $\overrightarrow{y}$ to $W$, where $W=\mbox{Span}\ \overrightarrow{u_{1}}=\left[\begin{array}{c} -2\\ 3\\ 0 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 3\\ 2\\ 1 \end{array}\right]\}$ and $\overrightarrow{y}=\left[\begin{array}{c} 2\\ 1\\ 2 \end{array}\right]$.

Exercise 3: The distance from a point $\overrightarrow{y}$ in $\mathbb{R}^{n}$ to a subspace $W$ is defined as the distance from $\overrightarrow{y}$ to the nearest point in $W$. Find the distance from $\overrightarrow{y}$ to $W$, where $W=\mbox{Span}\ \overrightarrow{u_{1}}=\left[\begin{array}{c} -1\\ 2\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 1\\ 1\\ 1 \end{array}\right]\}$ and $\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right]$.

Example 4: Find the best approximation to $\overrightarrow{y}$ by vectors of the form $c_{1}\overrightarrow{u_{1}}+c_{2}\overrightarrow{u_{2}}$ where $\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ -3\\ 2\\ 2 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 2\\ 2\\ 0\\ 1 \end{array}\right]$ and $\overrightarrow{y}=\left[\begin{array}{c} 2\\ 1\\ 5\\ 3 \end{array}\right]$.

Exercise 4: Find the best approximation to $\overrightarrow{y}$ by vectors of the form $c_{1}\overrightarrow{u_{1}}+c_{2}\overrightarrow{u_{2}}$

where $\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ -3\\ 1\\ 4 \end{array}\right]$ , $\overrightarrow{u_{2}}=\left[\begin{array}{c} -3\\ 0\\ 2\\ 1 \end{array}\right]$  and $\overrightarrow{y}=\left[\begin{array}{c} 2\\ -1\\ 4\\ 3 \end{array}\right]$.

Definition: If $A$ is$m\times n$and $\overrightarrow{b}$ is in $\mathbb{R}^{m}$, a least-squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ is an $\widehat{x}$ in $\mathbb{R}^{n}$ such that $||\overrightarrow{b}-A\widehat{x}||\leq||\overrightarrow{b}-A\overrightarrow{x}||$ for all $\overrightarrow{x}$ in $\mathbb{R}^{n}$.

Remark: 1. Given $A$ and $\overrightarrow{b}$, apply the Best Approximation Theorem to the subspace$\mbox{Col}A$ and let $\widehat{b}=\mbox{proj}_{\mbox{Col}A}\overrightarrow{b}$. Because $\widehat{b}$ is in the column space$A$, the  equation$A\overrightarrow{x}=\widehat{b}$ is consistent, and there is an$\widehat{x}$ in $\mathbb{R}^{n}$ such that $A\widehat{x}=\widehat{b}$. Since$\widehat{b}$ is the
closest point in $\mbox{Col}A$ to $\overrightarrow{b}$ ,vector $\widehat{x}$ is a least-squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ if and only if $\widehat{x}$ satisfies$A\widehat{x}=\widehat{b}=\mbox{proj}_{\mbox{Col}A}\overrightarrow{b}$.

2. $\widehat{b}$ has the property that$\overrightarrow{b}-\widehat{b}$ is orthogonal to$\mbox{Col}A$, so $\overrightarrow{b}-A\widehat{x}$ is orthogonal to each column of $A$, i.e. $\overrightarrow{a_{j}}\cdot(\overrightarrow{b}-A\widehat{x})=0$ for any column $\overrightarrow{a_{j}}$ of $A$ or $A^{T}(\overrightarrow{b}-A\widehat{x})=0$ which is equivalent to $A^{T}A\widehat{x}=A^{T}\overrightarrow{b}$.

Definition:$A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}$ is called the normal equations for $A\overrightarrow{x}=\overrightarrow{b}$. A solution of $A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}$ is often denoted by $\widehat{x}$.}

Theorem: The set of least-squares solutions of $A\overrightarrow{x}=\overrightarrow{b}$ coincides with the nonempty set of solutions of the normal equation  $A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}$.

Example 5: Find a least-squares solution of the inconsistent system for

$A=\left[\begin{array}{cc} 3 & 0\\ 0 & 2\\ -1 & 1 \end{array}\right]$  ,$\overrightarrow{b}=\left[\begin{array}{c} 1\\ 0\\ 3 \end{array}\right]$

Exercise 5: Find a least-squares solution of the inconsistent system for$A=\left[\begin{array}{cc} 2 & 1\\ -1 & 0\\ 0 & 2 \end{array}\right]$  , $\overrightarrow{b}=\left[\begin{array}{c} 0\\ 1\\ 2 \end{array}\right]$

Theorem: Let $A$ be an $m\times n$ matrix. The following statements are logically equivalent:

(a) The equation $A\overrightarrow{x}=\overrightarrow{b}$ has a unique least-squares solution for each $\overrightarrow{b}$ in $\mathbb{R}^{m}$.

(b) The columns of $A$ are linearly independent.

(c) The matrix $A^{T}A$ is invertible.

When these statements are true, the least-squares solution $\widehat{x}$ is given by $\widehat{x}=(A^{T}A)^{-1}A^{T}\overrightarrow{b}$.

Remark: When a least-squares solution $\widehat{x}$ is used to produce$A\widehat{x}=\widehat{b}$ as an approximation to $\overrightarrow{b}$, the distance from $\overrightarrow{b}$ to $A\widehat{x}=\widehat{b}$ is called the least-squares error of this approximation.

Example 6: Find a least-squares solution of the inconsistent system for
$A=\left[\begin{array}{cc} 1 & 1\\ 1 & 0\\ 0 & 2\\ -1 & 1 \end{array}\right]$ . , $\overrightarrow{b}=\left[\begin{array}{c} 1\\ 0\\ 2\\ 3 \end{array}\right]$

Exercise 6: Find a least-squares solution of the inconsistent system for
$A=\left[\begin{array}{cc} 0 & 1\\ 1 & 0\\ -2 & 1\\ -1 & -2 \end{array}\right]$  , $\overrightarrow{b}=\left[\begin{array}{c} 1\\ -1\\ 0\\ 2 \end{array}\right]$

Group Work 1: True or False. $A$ is an $m\times n$ matrix and $\overrightarrow{b}$ is in $\mathbb{R}$.

a. The general least squares problem is to find an $\overrightarrow{x}$ that makes $A\overrightarrow{x}$ as close as possible to $\overrightarrow{b}$.

b. A least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ is a vector $\widehat{x}$ that satisfies $A\widehat{x}=\widehat{b}$ where $\widehat{b}$ is the orthogonal projection of $\overrightarrow{b}$ onto $\mbox{Col}A$.

c. A least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ is a vector $\widehat{x}$ such that$||\overrightarrow{b}-A\overrightarrow{x}||\leq||\overrightarrow{b}-A\widehat{x}||$ for all$\overrightarrow{x}$ in $\mathbb{R}^{n}$.

d. Any solution of $A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}$ is a least squares solution of$A\overrightarrow{x}=\overrightarrow{b}$.

e. If the columns of $A$ are linearly independent, then the equation$A\overrightarrow{x}=\overrightarrow{b}$ has exactly one least squares solution.

f. For each $\overrightarrow{y}$ and each subspace $W$, the vector $\overrightarrow{y}-\mbox{proj}_{W}\overrightarrow{y}$ is orthogonal to $W$.

g. The orthogonal projection $\widehat{y}$ of $\overrightarrow{y}$ onto a subspace $W$ can sometimes depend on the orthogonal basis for $W$ used to compute$\widehat{y}$.

Group Work 2: Find a formula for the least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ when the columns of $A$ are orthonormal.

Group Work 3: True or False. $A$ is an $m\times n$ matrix and $\overrightarrow{b}$ is in $\mathbb{R}$.

a. If$\overrightarrow{b}$ is in the column space of $A$, then every solution of $A\overrightarrow{x}=\overrightarrow{b}$ is a least squares solution.

b. The least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ is the point in the column space of $A$ closest to $\overrightarrow{b}$.

c. A least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$ is a list of weights that, when applied to the columns of $A$, produces the orthogonal projection of $\overrightarrow{b}$ onto $\mbox{Col}A$.

d. If $\widehat{x}$ is a least squares solution of $A\overrightarrow{x}=\overrightarrow{b}$, then $\widehat{x}=(A^{T}A)^{-1}A^{T}\overrightarrow{b}$.

e. If $\overrightarrow{y}$ is in a subspace $W$, then the orthogonal projection of $\overrightarrow{y}$ onto $W$ is $\overrightarrow{y}$ itself.

f. The best approximation to $\overrightarrow{y}$ by elements of a subspace $W$ is given by the vector $\overrightarrow{y}-\mbox{proj}_{W}\overrightarrow{y}$.

Group Work 4: Describe all least squares solutions of the system

x+y  =  2

x+y  =  4