## Multiplication $Ax$ using Columns of $A$

$A\boldsymbol{x}$ is a linear combination of columns of $A$.

Linear combination of all the columns of $A$ produce the $\textbf{column space}$ of $A$.

$b$ is in the column space of $A$ when $A\boldsymbol{x}=b$ has a solution.

A column of $A$ is $\textbf{dependent}$ if it is a linear combination of other columns of $A$ else it is $\textbf{independent}$.

### Independent columns and the rank of $A$

A $basis$ for a subspace is a full set of $independent$ vectors. All vectors in the space are linear combinations of the basis vectors.

#### Finding the basis $C$ of a column space of $A$

If column $1$ of $A$ is not all zero, put it into matrix $C$.

If column $2$ of $A$ is not a multiple of column $1$, put it into $C$.

If column $3$ of $A$ is not a combination of columns $1$ and $2$, put it into $C$. Continue.

At the end $C$ will have $r$ columns $(r \leq n)$.

They will be the $basis$ for the column space of $A$.

The number $r$ is the $\textbf{rank}$ of $A$. It is also the rank of $C$. It counts independent columns. The number $r$ is the $\text{dimension}$ of the column space of $A$ and $C$(same space).

$\textbf{The rank of a matrix is the dimension of its column space}. The matrix$C$connects to$A$by a third matrix$R: A=CR$. If the shape of$A$is$m \times n$, then shape of$C$is$m \times r$and shape of$R$is$r \times n$.$R =$row-reduced echelon form of$A$(without zero rows).$\textbf{Rank Theorem: The number of independent columns equals the number of independent rows}.$This rank theorem is true for every matrix. The theorem is proved by$A=CR$. The matrix$R$has$r$rows, multiplying by$C$takes combinations of those rows. Since$A=CR$, we get every row of$A$from the$r$rows of$R$. And those$r$rows are independent, so they are the basis for the row space of$A$. The column space and row space of$A$have both dimension$r$, with$r$basis vectors - columns of$C$and rows of$R$. ## Matrix-matrix multiplication$AB\textbf{Inner products}$(row times columns) produce each of the numbers$AB=C$. The other way to multiply$AB$is columns of$A$times rows of$B$(outer product). Let$\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$be the$n$columns of$A$. Then$B$must have$n$rows$\mathbf{b_1^*}, \mathbf{b_2^*},\dots, \mathbf{b_n^*}$. Their product$AB$is the sum of columns$\mathbf{a_k}$times rows$\mathbf{b_k^*}: \begin{aligned} AB &= \ \begin{bmatrix} \vert & &\vert \\ \mathbf{a}_1 & \dots &\mathbf{a}_n \\ \vert & &\vert \end{bmatrix} \begin{bmatrix} \text{—} & \mathbf{b}_1^* &\text{—} \\ & \vdots & \\ \text{—} & \mathbf{b}_n^* &\text{—} \end{bmatrix} \\ &= \mathbf{a_1}\mathbf{b_1^*} + \dots + \mathbf{a_n}\mathbf{b_n^*} \text{ (sum of rank 1 matrices)} \end{aligned} In data science, we are looking for an important part of matrixA$and those pieces are rank one matrices. A dominant theme in applied linear algebra is$\textbf{Factor $A$ into $CR$ and look at the pieces $\mathbf{c_k}\mathbf{r_k^*}$ of $A=CR$}$. ### Important Factorizations The five important factorizations are: 1.$A=LU$comes from$\textbf{elimination}$. Combinations of rows take$A$to$U$and$U$back to$A$. The matrix$L$is lower triangular and$U$is upper triangular. 2.$A=QR$comes from$\textbf{orthogonalizing}$the columns$\mathbf{a_1}$to$\mathbf{a_n}$as in “Gram-Schmidt”.$Q$has orthonormal columns($QQ^T=1)$and$R$is upper trianuglar. 3.$S=Q\Lambda Q^T$comes from the$\textbf{eigen values}\lambda_1 ,\dots, \lambda_n$of a symmetric matrix$S=S^T$. Eigen values on the diagonal of$\Lambda$.$\textbf{Orthonormal eigenvectors}$in the columns of$Q$. 4.$A=X\Lambda X^{-1}$is$\textbf{diagonalization}$when$A$is$n$by$n$with$n$independent eigenvectors. Eigenvlaues of$A$on the diagonal of$\Lambda$. Eigenvectors of$A$in the columns of$X$. 5.$A=U\Sigma V^T$is the$\textbf{Singular Value Decomposition}$of any matrix$A$.$\textbf{Singular values} \sigma_1, \dots, \sigma_r$in$\Sigma$. Orthonormal$\textbf{singular vectors}$in$U$and$V$. ## Four fundamental subspaces Every$m \times n$matrix$A$leads to four subspaces - two subspaces of$\mathbf{R}^m$and two more of$\mathbf{R}^n$. The column space$\mathbf{C}(A)$contains all combinations of columns of$A$. The row space$\mathbf{C}(A^T)$contains all combinations of columns of$A^T$. The nullspace$\mathbf{N}(A)$contains all solutions$x$of$Ax=0$. The left nullspace$\mathbf{N}(A^T)$contains all solutions$y$of$A^Ty=0$.$\textbf{Counting Law: $r$ independent equations $Ax=0$ have $n-r$ independent solutions}.$The column space$\mathbf{C}(A)$with dimension$r$is orthogonal to the left nullspace$\mathbf{N}(A^T)$with dimension$m-r$. The row space$\mathbf{C}(A^T)$with dimension$r$is orthogonal to the nullspace$\mathbf{N}(A)$with dimension$n-r$. The dimensions of the column space and nullspace add upto$n$. The dimensions of the row space and left nullspace add upto$m$. ### Five key facts about ranks 1. Rank of$AB \leq$rank of$A$. 2. Rank of$A+B \leq$(rank of$A$) + (rank of$B$) 3. Rank of$A^TA$= rank of$AA^T$= rank of$A$= rank of$A^T$4. If$A$is$m\times r$and$B$is$r \times n$both with rank$r$, then$AB$also has rank$r$. ## Elimination and$A=LUA = L$times$U$is the matrix description of elimination without row exchanges. ## Orthogonal matrices and subspaces 1. Orthogonal vectors$\boldsymbol{x}$and$\boldsymbol{y}$, The test is$\boldsymbol{x}^T\boldsymbol{y} = x_1y_1 + \cdots + x_ny_n=0$. 2. Orthogonal basis for a subspace: Every pair of basis vectors has$\boldsymbol{v}_i^T\boldsymbol{v}_j=0$.Orthonormal basis: Orthogonal basis of unit vectors: every$\boldsymbol{v}_i^T\boldsymbol{v}_i=1$. From orthogonal to orthonormal, jusr divide the basis vector$\boldsymbol{v}_i$by$||\boldsymbol{v}_i||$. 3. Orthogonal subspaces rowspace and nullspace. Every vector in the rowspace is orthogonal to every vector in the nullspace.$A\boldsymbol{x}=0$means$each row\cdot \boldsymbol{x} = 0$. Every row (and every combination of rows) is orthogonal to all$\boldsymbol{x}$in the nullspace. 4. Tall thin matrices$Q$with orthonormal columns:$Q^TQ=I$. If this$Q$multiplies any vector$\boldsymbol{x}$, the length of the vector doesn’t change: $$||Q\boldsymbol{x}|| = ||\boldsymbol{x}|| \text { because (Q\boldsymbol{x})^T(Q\boldsymbol{x})=\boldsymbol{x}^TQ^TQx=\boldsymbol{x}^T\boldsymbol{x}}$$ If$m>n$the$m$rows cannot be orthogonal in$R^n$. Tall thin matrices have$QQ^T \neq I$. 5. Orthogonal matrices are square with orthonormal columns:$Q^T = Q^{-1}$. For square matrices$Q^TQ=I$leads to$QQ^T=I$. For square matrices$Q$, the left inverse$Q^T$is also the right inverse of$Q$. The columns of this orthogonal$n \times n$matrix are an orthonormal basis fof$\mathbf{R}^n$.$\textbf{Every subspace of $\mathbf{R}^n$ has an orthogonal basis}$. Let$\mathbf{a}$and$\mathbf{b}$be two independent vectors in three-dimensional space$\mathbf{R}^3$. For an orthogonal basis, subtract away from$\mathbf{b}$its component in the direction of$\mathbf{a}$: Orthogonal basis$\mathbf{a}$and$\mathbf{c}$where$\mathbf{c} = \mathbf{b} - \frac{\mathbf{a}^Tb}{\mathbf{a}^Ta}\mathbf{a}\textbf{If $P^2=P = P^T$ then $Pb$ is the orthogonal projection of $b$ onto the columns of $P$}$. ### Orthogonal Basis = Orthogonal axes in$\mathbf{R}^n$Suppose the$n \times n$orthogonal matrix$Q$has columns$\mathbf{q}_1, \dots , \mathbf{q}_n$. Those unit vectors are a basis for$n$-dimensional space$\mathbf{R}^n$. Every vector$\mathbf{v}$can be written as a combination of the basis vectors: $$\mathbf{v} = c_1\mathbf{q}_1 + \dots + c_n\mathbf{q}_n$$ Those$c_1\mathbf{q}_i$are the components of$v$along the axes. They are the projections of$\mathbf{v}$onto the axes! There is a simple formula for each number$c_1$to$c_n$: Coefficients in an orthonormal basis$c_i = \mathbf{q}_i^T\mathbf{v}$. This is the key application of orthogonal bases (for example the basis for Fourier series). When basis vectors are orthonormal, each coefficient$c_1$to$c_n$can be found separately! ## Eigen values and Eigen vectors The eigenvectors of$A$don’t change direction when you multiply them by$A$. The output$Ax$is on the same line as the input vector$x$. $$\boldsymbol{x} = \text{eigenvector of A} \\ A = \text{eigenvalue of A}\\ A\boldsymbol{x}=\lambda \boldsymbol{x}$$ The eigenvector$\boldsymbol{x}$is just multiplied by its eigenvalue$\lambda$. Multiply again by$A$, to see that$x$is also an eigenvector of$A^2$:$A^2 \boldsymbol{x} = \lambda^2 \boldsymbol{x}$. Certainly$A^k \boldsymbol{x} = \lambda^k \boldsymbol{x}$for all$k = 1, 2, 3, …$And$A^{-1} x = \frac{1}{\lambda} \boldsymbol{x}$provided$\lambda \neq 0$. These eigenvectors are special vectors that depend on$A$. Most$n \times n$matrices have$n$independent eigenvectors$\boldsymbol{x}_i$to$\boldsymbol{x}_n$with$n$different eigenvalues$\lambda_i$to$\lambda_n$. In that case every$n$-dimensional vector$vwill be a combination of the eigenvectors: \begin{aligned} \text{Every \boldsymbol{v}, } \boldsymbol{v} &= c_1x_1 + \dots+c_nx_n \\ \text{Multiply by A, } A\boldsymbol{v}&= c_1\lambda_1x_1 + \dots+c_n\lambda_nx_n \\ \text{Multiply by A^k, } A^k\boldsymbol{v}&= c_1\lambda_1^kx_1 + \dots+c_n\lambda_n^kx_n \\ \end{aligned} The matrixA$also controls a system of linear differential equations$d\mathbf{u}/dt = A\mathbf{u}$. The system starts at an initial vector$\mathbf{u}(0)$when$t = 0$. Every eigenvector grows or decays or oscillates according to its own eigenvalue$\lambda$. Powers$\lambda^n$are changed to exponentials$e^{\lambda}t: \begin{aligned} \text{Starting vector, } \mathbf{u}(0) &= c_1x_1 + \dots + c_nx_n \\ \text{Solution vector, } \mathbf{u}(t) &= c_1e^{\lambda_1t}x_1 + \dots + c_ne^{\lambda_nt}x_n \\ \end{aligned} ### Similar matrices For every invertible matrixB$, the eigenvalues of$BAB^{-1}$are the same as the eigenvalues of$A$. The eigenvectors$\boldsymbol{x}$of$A$are multiplied by$B$to give eigenvectors$Bx$of$BAB^{-1}$: If$A\boldsymbol{x} =\lambda \boldsymbol{x}$then$(BAB^{-1}) (B\boldsymbol{x}) = BA\boldsymbol{x} = B\lambda \boldsymbol{x}$. The matrices$BAB^{-1}$(for every invertible$B$) are “similar” to$A$: same eigenvalues. We use this idea to compute eigenvalues of large matrices (when the determinant of$A- \lambda I$would be completely hopeless). The idea is to make$BAB^{-1}$gradually into a triangular matrix. The eigenvalues are not changing and they gradually show up on the main diagonal of$BAB^{-1}$. ### Diagonalizing a matrix Suppose$A$has a full set of$n$independent eigenvectors. (Most matrices do, but not all matrices.) Put those eigenvectors$\boldsymbol{x}_1 \dots \boldsymbol{x}_n$into an invertible matrix$X$. Then multiply$AX$column by column to get the columns$A\boldsymbol{x}_1$to$A\boldsymbol{x}_n$. That matrix splits into$X$times$\Lambda. \begin{aligned} A \begin{bmatrix} \boldsymbol{x}_1 & \dots & \boldsymbol{x}_n \end{bmatrix} &= \begin{bmatrix} A\boldsymbol{x}_1 & \dots & A\boldsymbol{x}_n \end{bmatrix} \\ &= \begin{bmatrix} \lambda_1 \boldsymbol{x}_1 & \dots & \lambda_n \boldsymbol{x}_n \end{bmatrix} \\ &= \begin{bmatrix} \boldsymbol{x}_1 & \dots & \boldsymbol{x}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix} \end{aligned} The eigenvalue matrix\Lambda$goes on the right of$X$, because the$\lambda$’s in$\Lambda$multiply the columns of$X$. That equation$AX = X \Lambda$tells us that$A = X \Lambda X^{-1}$. If we know the eigenvalues and eigenvectors, we know the matrix$A$. And we can easily compute powers of$A$:$A^k = X \Lambda^kX^{-1}$. ## Symmetric Positive Definite Matrices 1. All$n$eigenvalues$\lambda$of a symmetric matrix$S$are real numbers. 2. The$n$eigenvectors$q$can be chosen orthogonal (perpendicular to each other). Then those eigenvectors$\boldsymbol{q}_1, \dots, \boldsymbol{q}_n$are not just orthogonal, they are$\textbf{orthonormal}$. The eigenvector matrix for$S$has$Q^TQ= I$: orthonormal columns in$Q$. We write$Q$instead of$X$for the eigenvector matrix of$S$, to emphasize that these eigenvectors are orthonormal:$Q^T Q = I$and$Q^T = Q^{-1}$. This eigenvector matrix is an orthogonal matrix. The usual$A = X \Lambda X^{-1}$becomes$S = Q \Lambda Q^T$: Spectral Theorem - Every real symmetric matrix has the form$S= Q \Lambda Q^T\$.

Linear Algebra and Learning from Data