Highlights of Linear Algebra

Matrix multiplication, eigen values, singular values, factorization and optimization.

By Vamshi Jandhyala in mathematics machine learning linear algebra

September 1, 2021

Multiplication $Ax$ using Columns of $A$

$A\boldsymbol{x}$ is a linear combination of columns of $A$.

Linear combination of all the columns of $A$ produce the $\textbf{column space}$ of $A$.

$b$ is in the column space of $A$ when $A\boldsymbol{x}=b$ has a solution.

A column of $A$ is $\textbf{dependent}$ if it is a linear combination of other columns of $A$ else it is $\textbf{independent}$.

Independent columns and the rank of $A$

A $basis$ for a subspace is a full set of $independent$ vectors. All vectors in the space are linear combinations of the basis vectors.

Finding the basis $C$ of a column space of $A$

If column $1$ of $A$ is not all zero, put it into matrix $C$.

If column $2$ of $A$ is not a multiple of column $1$, put it into $C$.

If column $3$ of $A$ is not a combination of columns $1$ and $2$, put it into $C$. Continue.

At the end $C$ will have $r$ columns $(r \leq n)$.

They will be the $basis$ for the column space of $A$.

The number $r$ is the $\textbf{rank}$ of $A$. It is also the rank of $C$. It counts independent columns. The number $r$ is the $\text{dimension}$ of the column space of $A$ and $C$(same space).

$\textbf{The rank of a matrix is the dimension of its column space}.

The matrix $C$ connects to $A$ by a third matrix $R: A=CR$. If the shape of $A$ is $m \times n$, then shape of $C$ is $m \times r$ and shape of $R$ is $r \times n$.

$R =$ row-reduced echelon form of $A$ (without zero rows).

$\textbf{Rank Theorem: The number of independent columns equals the number of independent rows}.$

This rank theorem is true for every matrix. The theorem is proved by $A=CR$. The matrix $R$ has $r$ rows, multiplying by $C$ takes combinations of those rows. Since $A=CR$, we get every row of $A$ from the $r$ rows of $R$. And those $r$ rows are independent, so they are the basis for the row space of $A$. The column space and row space of $A$ have both dimension $r$, with $r$ basis vectors - columns of $C$ and rows of $R$.

Matrix-matrix multiplication $AB$

$\textbf{Inner products}$ (row times columns) produce each of the numbers $AB=C$.

The other way to multiply $AB$ is columns of $A$ times rows of $B$ (outer product).

Let $\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$ be the $n$ columns of $A$. Then $B$ must have $n$ rows $\mathbf{b_1^*}, \mathbf{b_2^*},\dots, \mathbf{b_n^*}$. Their product $AB$ is the sum of columns $\mathbf{a_k}$ times rows $\mathbf{b_k^*}$:

$$ \begin{aligned} AB &= \ \begin{bmatrix} \vert & &\vert \\
\mathbf{a}_1 & \dots &\mathbf{a}_n \\
\vert & &\vert \end{bmatrix} \begin{bmatrix} \text{—} & \mathbf{b}_1^* &\text{—} \\
& \vdots & \\
\text{—} & \mathbf{b}_n^* &\text{—} \end{bmatrix} \\
&= \mathbf{a_1}\mathbf{b_1^*} + \dots + \mathbf{a_n}\mathbf{b_n^*} \text{ (sum of rank $1$ matrices)} \end{aligned} $$

In data science, we are looking for an important part of matrix $A$ and those pieces are rank one matrices. A dominant theme in applied linear algebra is

$\textbf{Factor $A$ into $CR$ and look at the pieces $\mathbf{c_k}\mathbf{r_k^*}$ of $A=CR$}$.

Important Factorizations

The five important factorizations are:

  1. $A=LU$ comes from $\textbf{elimination}$. Combinations of rows take $A$ to $U$ and $U$ back to $A$. The matrix $L$ is lower triangular and $U$ is upper triangular.

  2. $A=QR$ comes from $\textbf{orthogonalizing}$ the columns $\mathbf{a_1}$ to $\mathbf{a_n}$ as in “Gram-Schmidt”. $Q$ has orthonormal columns($QQ^T=1)$ and $R$ is upper trianuglar.

  3. $S=Q\Lambda Q^T$ comes from the $\textbf{eigen values}$ $\lambda_1 ,\dots, \lambda_n$ of a symmetric matrix $S=S^T$. Eigen values on the diagonal of $\Lambda$. $\textbf{Orthonormal eigenvectors}$ in the columns of $Q$.

  4. $A=X\Lambda X^{-1}$ is $\textbf{diagonalization}$ when $A$ is $n$ by $n$ with $n$ independent eigenvectors. Eigenvlaues of $A$ on the diagonal of $\Lambda$. Eigenvectors of $A$ in the columns of $X$.

  5. $A=U\Sigma V^T$ is the $\textbf{Singular Value Decomposition}$ of any matrix $A$. $\textbf{Singular values} \sigma_1, \dots, \sigma_r$ in $\Sigma$. Orthonormal $\textbf{singular vectors}$ in $U$ and $V$.

Four fundamental subspaces

Every $m \times n$ matrix $A$ leads to four subspaces - two subspaces of $\mathbf{R}^m$ and two more of $\mathbf{R}^n$.

The column space $\mathbf{C}(A)$ contains all combinations of columns of $A$. The row space $\mathbf{C}(A^T)$ contains all combinations of columns of $A^T$. The nullspace $\mathbf{N}(A)$ contains all solutions $x$ of $Ax=0$. The left nullspace $\mathbf{N}(A^T)$ contains all solutions $y$ of $A^Ty=0$.

$\textbf{Counting Law: $r$ independent equations $Ax=0$ have $n-r$ independent solutions}.$

The column space $\mathbf{C}(A)$ with dimension $r$ is orthogonal to the left nullspace $\mathbf{N}(A^T)$ with dimension $m-r$.

The row space $\mathbf{C}(A^T)$ with dimension $r$ is orthogonal to the nullspace $\mathbf{N}(A)$ with dimension $n-r$.

The dimensions of the column space and nullspace add upto $n$.

The dimensions of the row space and left nullspace add upto $m$.

Five key facts about ranks

  1. Rank of $AB \leq$ rank of $A$.

  2. Rank of $A+B \leq$ (rank of $A$) + (rank of $B$)

  3. Rank of $A^TA$= rank of $AA^T$ = rank of $A$ = rank of $A^T$

  4. If $A$ is $m\times r$ and $B$ is $r \times n$ both with rank $r$, then $AB$ also has rank $r$.

Elimination and $A=LU$

$A = L$ times $U$ is the matrix description of elimination without row exchanges.

Orthogonal matrices and subspaces

  1. Orthogonal vectors $\boldsymbol{x}$ and $\boldsymbol{y}$, The test is $\boldsymbol{x}^T\boldsymbol{y} = x_1y_1 + \cdots + x_ny_n=0$.

  2. Orthogonal basis for a subspace: Every pair of basis vectors has $\boldsymbol{v}_i^T\boldsymbol{v}_j=0$.Orthonormal basis: Orthogonal basis of unit vectors: every $\boldsymbol{v}_i^T\boldsymbol{v}_i=1$. From orthogonal to orthonormal, jusr divide the basis vector $\boldsymbol{v}_i$ by $||\boldsymbol{v}_i||$.

  3. Orthogonal subspaces rowspace and nullspace. Every vector in the rowspace is orthogonal to every vector in the nullspace.$A\boldsymbol{x}=0$ means $each row\cdot \boldsymbol{x} = 0$. Every row (and every combination of rows) is orthogonal to all $\boldsymbol{x}$ in the nullspace.

  4. Tall thin matrices $Q$ with orthonormal columns: $Q^TQ=I$. If this $Q$ multiplies any vector $\boldsymbol{x}$, the length of the vector doesn’t change: $$ ||Q\boldsymbol{x}|| = ||\boldsymbol{x}|| \text { because $(Q\boldsymbol{x})^T(Q\boldsymbol{x})=\boldsymbol{x}^TQ^TQx=\boldsymbol{x}^T\boldsymbol{x}$} $$ If $m>n$ the $m$ rows cannot be orthogonal in $R^n$. Tall thin matrices have $QQ^T \neq I$.

  5. Orthogonal matrices are square with orthonormal columns: $Q^T = Q^{-1}$. For square matrices $Q^TQ=I$ leads to $QQ^T=I$. For square matrices $Q$, the left inverse $Q^T$ is also the right inverse of $Q$. The columns of this orthogonal $n \times n$ matrix are an orthonormal basis fof $\mathbf{R}^n$.

$\textbf{Every subspace of $\mathbf{R}^n$ has an orthogonal basis}$. Let $\mathbf{a}$ and $\mathbf{b}$ be two independent vectors in three-dimensional space $\mathbf{R}^3$. For an orthogonal basis, subtract away from $\mathbf{b}$ its component in the direction of $\mathbf{a}$:

Orthogonal basis $\mathbf{a}$ and $\mathbf{c}$ where $\mathbf{c} = \mathbf{b} - \frac{\mathbf{a}^Tb}{\mathbf{a}^Ta}\mathbf{a}$

$\textbf{If $P^2=P = P^T$ then $Pb$ is the orthogonal projection of $b$ onto the columns of $P$}$.

Orthogonal Basis = Orthogonal axes in $\mathbf{R}^n$

Suppose the $n \times n$ orthogonal matrix $Q$ has columns $\mathbf{q}_1, \dots , \mathbf{q}_n$. Those unit vectors are a basis for $n$-dimensional space $\mathbf{R}^n$. Every vector $\mathbf{v}$ can be written as a combination of the basis vectors:

$$ \mathbf{v} = c_1\mathbf{q}_1 + \dots + c_n\mathbf{q}_n $$

Those $c_1\mathbf{q}_i$ are the components of $v$ along the axes. They are the projections of $\mathbf{v}$ onto the axes! There is a simple formula for each number $c_1$ to $c_n$: Coefficients in an orthonormal basis $c_i = \mathbf{q}_i^T\mathbf{v}$.

This is the key application of orthogonal bases (for example the basis for Fourier series). When basis vectors are orthonormal, each coefficient $c_1$ to $c_n$ can be found separately!

Eigen values and Eigen vectors

The eigenvectors of $A$ don’t change direction when you multiply them by $A$. The output $Ax$ is on the same line as the input vector $x$.

$$ \boldsymbol{x} = \text{eigenvector of $A$} \\
A = \text{eigenvalue of $A$}\\
A\boldsymbol{x}=\lambda \boldsymbol{x} $$

The eigenvector $\boldsymbol{x}$ is just multiplied by its eigenvalue $\lambda$. Multiply again by $A$, to see that $x$ is also an eigenvector of $A^2$: $A^2 \boldsymbol{x} = \lambda^2 \boldsymbol{x}$.

Certainly $A^k \boldsymbol{x} = \lambda^k \boldsymbol{x}$ for all $k = 1, 2, 3, …$ And $A^{-1} x = \frac{1}{\lambda} \boldsymbol{x}$ provided $\lambda \neq 0$.

These eigenvectors are special vectors that depend on $A$. Most $n \times n$ matrices have $n$ independent eigenvectors $\boldsymbol{x}_i$ to $\boldsymbol{x}_n$ with $n$ different eigenvalues $\lambda_i$ to $\lambda_n$. In that case every $n$-dimensional vector $v$ will be a combination of the eigenvectors:

$$ \begin{aligned} \text{Every $\boldsymbol{v}$, } \boldsymbol{v} &= c_1x_1 + \dots+c_nx_n \\
\text{Multiply by $A$, } A\boldsymbol{v}&= c_1\lambda_1x_1 + \dots+c_n\lambda_nx_n \\
\text{Multiply by $A^k$, } A^k\boldsymbol{v}&= c_1\lambda_1^kx_1 + \dots+c_n\lambda_n^kx_n \\
\end{aligned} $$

The matrix $A$ also controls a system of linear differential equations $d\mathbf{u}/dt = A\mathbf{u}$. The system starts at an initial vector $\mathbf{u}(0)$ when $t = 0$. Every eigenvector grows or decays or oscillates according to its own eigenvalue $\lambda$. Powers $\lambda^n$ are changed to exponentials $e^{\lambda}t$:

$$ \begin{aligned} \text{Starting vector, } \mathbf{u}(0) &= c_1x_1 + \dots + c_nx_n \\
\text{Solution vector, } \mathbf{u}(t) &= c_1e^{\lambda_1t}x_1 + \dots + c_ne^{\lambda_nt}x_n \\
\end{aligned} $$

Similar matrices

For every invertible matrix $B$, the eigenvalues of $BAB^{-1}$ are the same as the eigenvalues of $A$. The eigenvectors $\boldsymbol{x}$ of $A$ are multiplied by $B$ to give eigenvectors $Bx$ of $BAB^{-1}$: If $A\boldsymbol{x} =\lambda \boldsymbol{x}$ then $(BAB^{-1}) (B\boldsymbol{x}) = BA\boldsymbol{x} = B\lambda \boldsymbol{x}$. The matrices $BAB^{-1}$ (for every invertible $B$) are “similar” to $A$: same eigenvalues. We use this idea to compute eigenvalues of large matrices (when the determinant of $A- \lambda I$ would be completely hopeless). The idea is to make $BAB^{-1}$ gradually into a triangular matrix. The eigenvalues are not changing and they gradually show up on the main diagonal of $BAB^{-1}$.

Diagonalizing a matrix

Suppose $A$ has a full set of $n$ independent eigenvectors. (Most matrices do, but not all matrices.) Put those eigenvectors $\boldsymbol{x}_1 \dots \boldsymbol{x}_n$ into an invertible matrix $X$. Then multiply $AX$ column by column to get the columns $A\boldsymbol{x}_1$ to $A\boldsymbol{x}_n$. That matrix splits into $X$ times $\Lambda$.

$$ \begin{aligned} A \begin{bmatrix} \boldsymbol{x}_1 & \dots & \boldsymbol{x}_n \end{bmatrix} &= \begin{bmatrix} A\boldsymbol{x}_1 & \dots & A\boldsymbol{x}_n \end{bmatrix} \\
&= \begin{bmatrix} \lambda_1 \boldsymbol{x}_1 & \dots & \lambda_n \boldsymbol{x}_n \end{bmatrix} \\
&= \begin{bmatrix} \boldsymbol{x}_1 & \dots & \boldsymbol{x}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & \\
& \ddots & \\
& & \lambda_n \end{bmatrix} \end{aligned} $$

The eigenvalue matrix $\Lambda$ goes on the right of $X$, because the $\lambda$’s in $\Lambda$ multiply the columns of $X$. That equation $AX = X \Lambda$ tells us that $A = X \Lambda X^{-1}$ . If we know the eigenvalues and eigenvectors, we know the matrix $A$. And we can easily compute powers of $A$: $A^k = X \Lambda^kX^{-1}$.

Symmetric Positive Definite Matrices

  1. All $n$ eigenvalues $\lambda$ of a symmetric matrix $S$ are real numbers.

  2. The $n$ eigenvectors $q$ can be chosen orthogonal (perpendicular to each other).

Then those eigenvectors $\boldsymbol{q}_1, \dots, \boldsymbol{q}_n$ are not just orthogonal, they are $\textbf{orthonormal}$. The eigenvector matrix for $S$ has $Q^TQ= I$: orthonormal columns in $Q$. We write $Q$ instead of $X$ for the eigenvector matrix of $S$, to emphasize that these eigenvectors are orthonormal: $Q^T Q = I$ and $Q^T = Q^{-1}$. This eigenvector matrix is an orthogonal matrix. The usual $A = X \Lambda X^{-1}$ becomes $S = Q \Lambda Q^T$:

Spectral Theorem - Every real symmetric matrix has the form $S= Q \Lambda Q^T$.

References

Linear Algebra and Learning from Data