Section 4.6 Best Approximation and Least Squares

Remark: Given a nonzero vector [latex]\overrightarrow{u}[/latex] in [latex]\mathbb{R}^{n}[/latex] , consider the problem of decomposing a vector [latex]\overrightarrow{y}[/latex] in into the sum of two vectors, one a multiple of [latex]\overrightarrow{u}[/latex] and the other orthogonal to [latex]\overrightarrow{u}[/latex]. We wish to write [latex]\overrightarrow{y}=\alpha\overrightarrow{u}+\overrightarrow{z}[/latex] for some scalar [latex]\alpha[/latex] and [latex]\overrightarrow{z}[/latex] is some vector orthogonal to [latex]\overrightarrow{u}[/latex] and [latex]\overrightarrow{y}-\alpha\overrightarrow{u}=\overrightarrow{z}[/latex]. [latex]\overrightarrow{y}-\alpha\overrightarrow{u}=\overrightarrow{z}[/latex] is orthogonal to [latex]\overrightarrow{u}[/latex] if an only if [latex](\overrightarrow{y}-\alpha\overrightarrow{u})\cdot\overrightarrow{u}=0[/latex] if and only if [latex]\overrightarrow{y}\cdot\overrightarrow{u}-\alpha\overrightarrow{u}\cdot\overrightarrow{u}=0[/latex] if and only if [latex]\alpha=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}[/latex] which is the wight of [latex]\overrightarrow{y}[/latex] with respect to [latex]\overrightarrow{u}[/latex] of the linear combination of an orthogonal set [latex]S[/latex] if [latex]\overrightarrow{u}[/latex] is part of [latex]S[/latex]. The vector [latex]\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}\overrightarrow{u}[/latex] is called the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]\overrightarrow{u}[/latex], and the vector [latex]\overrightarrow{y}-\widehat{y}=\overrightarrow{z}[/latex] is called the component of [latex]\overrightarrow{y}[/latex] orthogonal to [latex]\overrightarrow{u}[/latex]. Notice [latex]\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u}}{\overrightarrow{u}\cdot\overrightarrow{u}}\overrightarrow{u}=\frac{\overrightarrow{y}\cdot(c\overrightarrow{u})}{(c\overrightarrow{u})\cdot(c\overrightarrow{u})}(c\overrightarrow{u})[/latex] when [latex]c\neq0[/latex]. Hence [latex]\widehat{y}[/latex] is determined by the subspace [latex]L[/latex] spanned by [latex]\overrightarrow{u}[/latex]( the line passing [latex]0[/latex] and [latex]\overrightarrow{u}[/latex]). [latex]\widehat{y}[/latex] is denoted by proj[latex]_{L}\overrightarrow{y}[/latex] and is called the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]L[/latex].

 

Example 1: Let [latex]\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right][/latex] and [latex]\overrightarrow{u}=\left[\begin{array}{c} 1\\ -2\\ 3 \end{array}\right][/latex].

Find the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]\overrightarrow{u}[/latex].
Then write [latex]\overrightarrow{y}[/latex] as the sum of two orthogonal vectors, one in Span[latex]\{\overrightarrow{u}\}[/latex] and one orthogonal to [latex]\overrightarrow{u}[/latex].

 

 

Exercise 1:  Let [latex]\overrightarrow{y}=\left[\begin{array}{c} 2\\ -1\\ 3 \end{array}\right][/latex] and [latex]\overrightarrow{u}=\left[\begin{array}{c} -3\\ 2\\ 4 \end{array}\right][/latex]. Find the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]\overrightarrow{u}[/latex]. Then write [latex]\overrightarrow{y}[/latex] as the sum of two orthogonal vectors,
one in Span[latex]\{\overrightarrow{u}\}[/latex] and one orthogonal to [latex]\overrightarrow{u}[/latex].

The orthogonal projection of a point in [latex]\mathbb{R}^{2}[/latex] onto a line through the origin has an important analogue in [latex]\mathbb{R}^{n}[/latex]. Given a vector [latex]\overrightarrow{y}[/latex] and a subspace [latex]W[/latex] in [latex]\mathbb{R}^{n}[/latex], there is a vector [latex]\widehat{y}[/latex] in [latex]W[/latex] such that [latex]\widehat{y}[/latex] is the unique vector in [latex]W[/latex] for which [latex]\overrightarrow{y}-\widehat{y}[/latex] is orthogonal to [latex]W[/latex], and [latex]\widehat{y}[/latex] is the unique vector in [latex]W[/latex] closest to [latex]\overrightarrow{y}[/latex]. These two properties of provide the key to finding the least-squares solutions of linear systems.

 

Theorem: Let [latex]W[/latex] be a subspace of [latex]\mathbb{R}^{n}[/latex]. Then each [latex]\overrightarrow{y}[/latex] in [latex]\mathbb{R}^{n}[/latex] can be written uniquely in the form [latex]\overrightarrow{y}=\widehat{y}+\overrightarrow{z}[/latex] where [latex]\widehat{y}[/latex] is in [latex]W[/latex] and [latex]\overrightarrow{z}[/latex] is orthogonal to [latex]W[/latex]. In fact, if [latex]\{\overrightarrow{u_{1}},...,\overrightarrow{u_{p}}\}[/latex] is any orthogonal basis of [latex]W[/latex], then [latex]\widehat{y}=\frac{\overrightarrow{y}\cdot\overrightarrow{u_{1}}}{\overrightarrow{u_{1}}\cdot\overrightarrow{u_{1}}}\overrightarrow{u_{1}}+...+\frac{\overrightarrow{y}\cdot\overrightarrow{u_{p}}}{\overrightarrow{u_{p}}\cdot\overrightarrow{u_{p}}}\overrightarrow{u_{p}}[/latex] and [latex]\overrightarrow{z}=\overrightarrow{y}-\widehat{y}[/latex].

 

Definition: The vector [latex]\widehat{y}[/latex] is called the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]W[/latex] and often is written as [latex]\mbox{proj}_{W}\overrightarrow{y}[/latex].

 

Proof: Use [latex]\overrightarrow{z}\cdot\overrightarrow{u_{i}}=(\overrightarrow{y}-\widehat{y})\cdot\overrightarrow{u_{i}}=0[/latex]}

 

 

Example 2: Let [latex]W=\mbox{Span}\{\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ 1\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 1\\ 1\\ 3 \end{array}\right]\}[/latex]. Notice that [latex]\{\overrightarrow{u_{1}},\overrightarrow{u_{2}}\}[/latex]
is an orthogonal basis of [latex]W[/latex]. Let [latex]\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right][/latex]. Write [latex]\overrightarrow{y}[/latex] as the sum of a vector in [latex]W[/latex] and a vector orthogonal to [latex]W[/latex].

 

 

Exercise 2: Let [latex]W=\mbox{Span}\{\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ 1\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} -1\\ 5\\ 3 \end{array}\right]\}[/latex]. Notice that [latex]\{\overrightarrow{u_{1}},\overrightarrow{u_{2}}\}[/latex]
is an orthogonal basis of [latex]W[/latex]. Let [latex]\overrightarrow{y}=\left[\begin{array}{c} 4\\ 3\\ -4 \end{array}\right][/latex]. Write [latex]\overrightarrow{y}[/latex] as the sum of a vector in [latex]W[/latex] and a
vector orthogonal to [latex]W[/latex].

 

Remark: If [latex]\{\overrightarrow{u_{1}},...,\overrightarrow{u_{p}}\}[/latex] is an orthogonal basis for [latex]W[/latex] and if [latex]\overrightarrow{y}[/latex] happens to be in [latex]W[/latex], then the formula for [latex]\mbox{proj}_{W}\overrightarrow{y}[/latex] is exactly the same as the representation of [latex]\overrightarrow{y}[/latex] given in above theorem, i.e. [latex]\mbox{proj}_{W}\overrightarrow{y}=\overrightarrow{y}[/latex].

 

Theorem: Let [latex]W[/latex] be a subspaceof [latex]\mathbb{R}^{n}[/latex], let [latex]\overrightarrow{y}[/latex] be any vector in [latex]\mathbb{R}^{n}[/latex],and let [latex]\widehat{y}[/latex] be the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]W[/latex]. Then [latex]\widehat{y}[/latex] is the closest point in [latex]W[/latex] to [latex]\overrightarrow{y}[/latex], in the sense that [latex]||\overrightarrow{y}-\widehat{y}||<||\overrightarrow{y}-\overrightarrow{v}||[/latex] for all [latex]\overrightarrow{v}[/latex] in [latex]W[/latex] distinct from [latex]\widehat{y}[/latex].

 

Remark: The vector in [latex]\widehat{y}[/latex] is called the best approximation to [latex]\overrightarrow{y}[/latex] by elements of [latex]W[/latex]. The distance from [latex]\overrightarrow{y}[/latex] to [latex]\overrightarrow{v}[/latex], given by [latex]||\overrightarrow{y}-\overrightarrow{v}||[/latex], can be regarded as the \textquotedblleft error\textquotedblright{} of using [latex]\overrightarrow{v}[/latex] in place of [latex]\overrightarrow{y}[/latex]. The theorem says that this error is minimized when [latex]\overrightarrow{v}=\widehat{y}[/latex]. This theorem also shows that [latex]\widehat{y}[/latex] does not depend on the particular orthogonal basis used to compute it.

 

Example 3: The distance from a point [latex]\overrightarrow{y}[/latex] in [latex]\mathbb{R}^{n}[/latex] to a subspace [latex]W[/latex] is defined as the distance from [latex]\overrightarrow{y}[/latex] to the nearest point in [latex]W[/latex]. Find the
distance from [latex]\overrightarrow{y}[/latex] to [latex]W[/latex], where [latex]W=\mbox{Span}\ \overrightarrow{u_{1}}=\left[\begin{array}{c} -2\\ 3\\ 0 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 3\\ 2\\ 1 \end{array}\right]\}[/latex] and [latex]\overrightarrow{y}=\left[\begin{array}{c} 2\\ 1\\ 2 \end{array}\right][/latex].

Exercise 3: The distance from a point [latex]\overrightarrow{y}[/latex] in [latex]\mathbb{R}^{n}[/latex] to a subspace [latex]W[/latex] is defined as the distance from [latex]\overrightarrow{y}[/latex] to the nearest point in [latex]W[/latex]. Find the distance from [latex]\overrightarrow{y}[/latex] to [latex]W[/latex], where [latex]W=\mbox{Span}\ \overrightarrow{u_{1}}=\left[\begin{array}{c} -1\\ 2\\ -1 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 1\\ 1\\ 1 \end{array}\right]\}[/latex] and [latex]\overrightarrow{y}=\left[\begin{array}{c} 3\\ 4\\ 5 \end{array}\right][/latex].

 

Example 4: Find the best approximation to [latex]\overrightarrow{y}[/latex] by vectors of the form [latex]c_{1}\overrightarrow{u_{1}}+c_{2}\overrightarrow{u_{2}}[/latex] where [latex]\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ -3\\ 2\\ 2 \end{array}\right],\overrightarrow{u_{2}}=\left[\begin{array}{c} 2\\ 2\\ 0\\ 1 \end{array}\right][/latex] and [latex]\overrightarrow{y}=\left[\begin{array}{c} 2\\ 1\\ 5\\ 3 \end{array}\right][/latex].

 

 

Exercise 4: Find the best approximation to [latex]\overrightarrow{y}[/latex] by vectors of the form [latex]c_{1}\overrightarrow{u_{1}}+c_{2}\overrightarrow{u_{2}}[/latex]
where [latex]\overrightarrow{u_{1}}=\left[\begin{array}{c} 2\\ -3\\ 1\\ 4 \end{array}\right][/latex] , [latex]\overrightarrow{u_{2}}=\left[\begin{array}{c} -3\\ 0\\ 2\\ 1 \end{array}\right][/latex]  and [latex]\overrightarrow{y}=\left[\begin{array}{c} 2\\ -1\\ 4\\ 3 \end{array}\right][/latex].

 

Definition: If [latex]A[/latex] is[latex]m\times n[/latex]and [latex]\overrightarrow{b}[/latex] is in [latex]\mathbb{R}^{m}[/latex], a least-squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is an [latex]\widehat{x}[/latex] in [latex]\mathbb{R}^{n}[/latex] such that [latex]||\overrightarrow{b}-A\widehat{x}||\leq||\overrightarrow{b}-A\overrightarrow{x}||[/latex] for all [latex]\overrightarrow{x}[/latex] in [latex]\mathbb{R}^{n}[/latex].

 

Remark: 1. Given [latex]A[/latex] and [latex]\overrightarrow{b}[/latex], apply the Best Approximation Theorem to the subspace [latex]\mbox{Col}A[/latex] and let [latex]\widehat{b}=\mbox{proj}_{\mbox{Col}A}\overrightarrow{b}[/latex]. Because [latex]\widehat{b}[/latex] is in the column space [latex]A[/latex], the  equation [latex]A\overrightarrow{x}=\widehat{b}[/latex] is consistent, and there is an [latex]\widehat{x}[/latex] in [latex]\mathbb{R}^{n}[/latex] such that [latex]A\widehat{x}=\widehat{b}[/latex]. Since [latex]\widehat{b}[/latex] is the
closest point in [latex]\mbox{Col}A[/latex] to [latex]\overrightarrow{b}[/latex] , vector [latex]\widehat{x}[/latex] is a least-squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] if and only if [latex]\widehat{x}[/latex] satisfies [latex]A\widehat{x}=\widehat{b}=\mbox{proj}_{\mbox{Col}A}\overrightarrow{b}[/latex].

2. [latex]\widehat{b}[/latex] has the property that [latex]\overrightarrow{b}-\widehat{b}[/latex] is orthogonal to[latex]\mbox{Col}A[/latex], so [latex]\overrightarrow{b}-A\widehat{x}[/latex] is orthogonal to each column of [latex]A[/latex], i.e. [latex]\overrightarrow{a_{j}}\cdot(\overrightarrow{b}-A\widehat{x})=0[/latex] for any column [latex]\overrightarrow{a_{j}}[/latex] of [latex]A[/latex] or [latex]A^{T}(\overrightarrow{b}-A\widehat{x})=0[/latex] which is equivalent to [latex]A^{T}A\widehat{x}=A^{T}\overrightarrow{b}[/latex].

 

Definition:[latex]A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}[/latex] is called the normal equations for [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex]. A solution of [latex]A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}[/latex] is often denoted by [latex]\widehat{x}[/latex].}

 

Theorem: The set of least-squares solutions of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] coincides with the nonempty set of solutions of the normal equation  [latex]A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}[/latex].

 

 

Example 5: Find a least-squares solution of the inconsistent system for

[latex]A=\left[\begin{array}{cc} 3 & 0\\ 0 & 2\\ -1 & 1 \end{array}\right][/latex]  ,[latex]\overrightarrow{b}=\left[\begin{array}{c} 1\\ 0\\ 3 \end{array}\right][/latex]

 

 

Exercise 5: Find a least-squares solution of the inconsistent system for [latex]A=\left[\begin{array}{cc} 2 & 1\\ -1 & 0\\ 0 & 2 \end{array}\right][/latex]  , [latex]\overrightarrow{b}=\left[\begin{array}{c} 0\\ 1\\ 2 \end{array}\right][/latex]

Theorem: Let [latex]A[/latex] be an [latex]m\times n[/latex] matrix. The following statements are logically equivalent:

(a) The equation [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] has a unique least-squares solution for each [latex]\overrightarrow{b}[/latex] in [latex]\mathbb{R}^{m}[/latex].

(b) The columns of [latex]A[/latex] are linearly independent.

(c) The matrix [latex]A^{T}A[/latex] is invertible.

When these statements are true, the least-squares solution [latex]\widehat{x}[/latex] is given by [latex]\widehat{x}=(A^{T}A)^{-1}A^{T}\overrightarrow{b}[/latex].

 

Remark: When a least-squares solution [latex]\widehat{x}[/latex] is used to produce [latex]A\widehat{x}=\widehat{b}[/latex] as an approximation to [latex]\overrightarrow{b}[/latex], the distance from [latex]\overrightarrow{b}[/latex] to [latex]A\widehat{x}=\widehat{b}[/latex] is called the least-squares error of this approximation.

 

Example 6: Find a least-squares solution of the inconsistent system for
[latex]A=\left[\begin{array}{cc} 1 & 1\\ 1 & 0\\ 0 & 2\\ -1 & 1 \end{array}\right][/latex] . , [latex]\overrightarrow{b}=\left[\begin{array}{c} 1\\ 0\\ 2\\ 3 \end{array}\right][/latex]

 

 

Exercise 6: Find a least-squares solution of the inconsistent system for
[latex]A=\left[\begin{array}{cc} 0 & 1\\ 1 & 0\\ -2 & 1\\ -1 & -2 \end{array}\right][/latex]  , [latex]\overrightarrow{b}=\left[\begin{array}{c} 1\\ -1\\ 0\\ 2 \end{array}\right][/latex]

 

GroupWork 1: True or False. [latex]A[/latex] is an [latex]m\times n[/latex] matrix and [latex]\overrightarrow{b}[/latex] is in [latex]\mathbb{R}[/latex].

a. The general least squares problem is to find an [latex]\overrightarrow{x}[/latex] that makes [latex]A\overrightarrow{x}[/latex] as close as possible to [latex]\overrightarrow{b}[/latex].

b. A least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is a vector [latex]\widehat{x}[/latex] that satisfies [latex]A\widehat{x}=\widehat{b}[/latex] where [latex]\widehat{b}[/latex] is the orthogonal projection of [latex]\overrightarrow{b}[/latex] onto [latex]\mbox{Col}A[/latex].

c. A least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is a vector [latex]\widehat{x}[/latex] such that [latex]||\overrightarrow{b}-A\overrightarrow{x}||\leq||\overrightarrow{b}-A\widehat{x}||[/latex] for all [latex]\overrightarrow{x}[/latex] in [latex]\mathbb{R}^{n}[/latex].

d. Any solution of [latex]A^{T}A\overrightarrow{x}=A^{T}\overrightarrow{b}[/latex] is a least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex].

e. If the columns of [latex]A[/latex] are linearly independent, then the equation [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] has exactly one least squares solution.

f. For each [latex]\overrightarrow{y}[/latex] and each subspace [latex]W[/latex], the vector [latex]\overrightarrow{y}-\mbox{proj}_{W}\overrightarrow{y}[/latex] is orthogonal to [latex]W[/latex].

g. The orthogonal projection [latex]\widehat{y}[/latex] of [latex]\overrightarrow{y}[/latex] onto a subspace [latex]W[/latex] can sometimes depend on the orthogonal basis for [latex]W[/latex] used to compute [latex]\widehat{y}[/latex].

 

GroupWork 2: Find a formula for the least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] when the columns of [latex]A[/latex] are orthonormal.

 

GroupWork 3: True or False. [latex]A[/latex] is an [latex]m\times n[/latex] matrix and [latex]\overrightarrow{b}[/latex] is in [latex]\mathbb{R}[/latex].

a. If [latex]\overrightarrow{b}[/latex] is in the column space of [latex]A[/latex], then every solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is a least squares solution.

b. The least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is the point in the column space of [latex]A[/latex] closest to [latex]\overrightarrow{b}[/latex].

c. A least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex] is a list of weights that, when applied to the columns of [latex]A[/latex], produces the orthogonal projection of [latex]\overrightarrow{b}[/latex] onto [latex]\mbox{Col}A[/latex].

d. If [latex]\widehat{x}[/latex] is a least squares solution of [latex]A\overrightarrow{x}=\overrightarrow{b}[/latex], then [latex]\widehat{x}=(A^{T}A)^{-1}A^{T}\overrightarrow{b}[/latex].

e. If [latex]\overrightarrow{y}[/latex] is in a subspace [latex]W[/latex], then the orthogonal projection of [latex]\overrightarrow{y}[/latex] onto [latex]W[/latex] is [latex]\overrightarrow{y}[/latex] itself.

f. The best approximation to [latex]\overrightarrow{y}[/latex] by elements of a subspace [latex]W[/latex] is given by the vector [latex]\overrightarrow{y}-\mbox{proj}_{W}\overrightarrow{y}[/latex].

 

GroupWork 4: Describe all least squares solutions of the system

x+y  =  2

x+y  =  4

 

License

Icon for the Creative Commons Attribution 4.0 International License

Matrices Copyright © 2019 by Kuei-Nuan Lin is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book