2  Linear Algebra

2.1 Vectors

A vector of numbers is a one-dimensional arrangement of numbers (integers, real, complex, etc.).

A column-vector is a column arrangement of mathematical objects, e.g., \[\begin{bmatrix} v_1 \\ v_2 \\v_3\\ \vdots \\ v_n\end{bmatrix}.\]

Examples include vector of integers: \(\begin{bmatrix} 1 \\ 2 \\3\\ \vdots \\ 20\end{bmatrix}\), vector of real-numbers: \(\begin{bmatrix} 1.234 \\ 2.345 \\3.456\\ \vdots \\ 20.212\end{bmatrix}\), and vector of complex numbers: \(\begin{bmatrix} 1+2j \\ 2+3j \\4+5j\\ \vdots \\ 20+21j\end{bmatrix}\).

Sometimes we represent the entire vector using a symbol with boldface font, e.g., \[\boldsymbol{x}= \begin{bmatrix} 1 \\ 2 \\3\\ \vdots \\5 \end{bmatrix},\] and refer to the entire vector as \(\boldsymbol{x}\).

import numpy as np

X = np.array([1, 2, 3, 4, 5])  # This is an array with 5 elements.
print(X)
[1 2 3 4 5]

2.1.1 Dimension

The number of elements in a vector is the dimension of the vector. For example, \[\boldsymbol{x}= \begin{bmatrix} 1 \\ 2 \\3\\ \vdots \\10 \end{bmatrix},\] is ten dimensional. If all the entries are real numbers, then we compactly represent \(\boldsymbol{x}\in\mathcal{R}^{10}\), where \(\mathcal{R}^n\) is the space of n-dimensional vectors with real entries. A vector can have elements that are complex numbers as well. In that case, we will denote \(\boldsymbol{x}\in\mathcal{C}^{n}\), where \(\mathcal{C}^n\) is the space of n-dimensional vectors with complex entries.

Here is a Python code computing the dimension of a vector.

import numpy as np

X = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])  # This is an array with 10 elements.
print("Dimension of X is: ", X.shape)
Dimension of X is:  (10,)

2.1.2 Transpose

If a vector \(\boldsymbol{x}\in\mathcal{R}^n := \begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}\) then the transpose of \(\boldsymbol{x}\) is denoted by \(\boldsymbol{x}^T\) and defined by \[ \boldsymbol{x}^T = \begin{bmatrix} x_1 & \cdots & x_n \end{bmatrix}.\] Note: \((\boldsymbol{x}^T)^T = \boldsymbol{x}\), i.e. the transpose of a transposed vector is the same vector.

Note: If \(\boldsymbol{x}\) is a column vector, \(\boldsymbol{x}^T\) is a row vector and vice versa.

Here is a Python code computing transpose of a vector.

import numpy as np

# Define a vector as a 1D NumPy array
vector = np.array([1, 2, 3, 4])

# Compute the transpose of the vector (which is still the same vector)
transpose_vector = vector.T

# Print the original vector and its transpose
print("Vector:", vector)

print("Transpose:", transpose_vector)
print(transpose_vector)
Vector: [1 2 3 4]
Transpose: [1 2 3 4]
[1 2 3 4]

Python doesn’t print the transpose of a column vector as a row vector. If they are treated as matrices then there will be a difference in how they are printed.

2.1.3 Operations

2.1.3.1 Negative of Vector

Given a vector \(\boldsymbol{x}= \begin{bmatrix}x_1 \\ \vdots \\ x_n \end{bmatrix}\), the negative of \(\boldsymbol{x}\) is \[-\boldsymbol{x}:= \begin{bmatrix}-x_1\\ \vdots \\ -x_n\end{bmatrix}.\]

2.1.3.2 Multiplication by a Scalar

Multiplication of a scalar with a vector is defined as \[ \alpha\boldsymbol{x}= \begin{bmatrix}\alpha x_1 \\ \alpha x_2 \\ \vdots \end{bmatrix}, \] i.e., the scalar \(\alpha\) multiplies all the elements of \(\boldsymbol{x}\). Also, \[\alpha\boldsymbol{x}= \boldsymbol{x}\alpha.\]

2.1.3.3 Vector Equality

Vector equality between two vectors is defined elementwise, i.e., the condition \(\boldsymbol{x}= \boldsymbol{y}\) is equivalent to \(n\) conditions \(x_i = y_i\), for \(i=1,\cdots,n\). Clearly, \(\boldsymbol{x}\) and \(\boldsymbol{y}\) would have to have the same dimension.

2.1.3.4 Vector Addition

Given two vectors \(\boldsymbol{x}\in \mathcal{R}^n\) and \(\boldsymbol{y}\in\mathcal{R}^n\) with respective components \(x_i\) and \(y_i\), then \(\boldsymbol{z}:= \boldsymbol{x}+\boldsymbol{y}\) has components \(z_i := x_i + y_i\). That is \[\boldsymbol{z}:= \boldsymbol{x}+ \boldsymbol{y}= \begin{bmatrix}x_1\\ \vdots \\ x_n\end{bmatrix} + \begin{bmatrix}y_1\\ \vdots \\ y_n\end{bmatrix} = \begin{bmatrix}x_1 + y_1\\ \vdots \\ x_n + y_n\end{bmatrix} = \begin{bmatrix}z_1\\ \vdots \\ z_n\end{bmatrix} .\] Note: \(\boldsymbol{x}\) and \(\boldsymbol{y}\) must of the same dimension.

Vector addition satisfies the following properties: \[\begin{align*} \boldsymbol{x}+ (\boldsymbol{y}+ \boldsymbol{z}) &= (\boldsymbol{x}+ \boldsymbol{y}) + \boldsymbol{z},\\ \alpha(\boldsymbol{x}+ \boldsymbol{y}) &= \alpha\boldsymbol{x}+ \alpha\boldsymbol{y}. \end{align*}\]

2.1.3.5 Vector Subtraction

Given two vectors \(\boldsymbol{x}\in \mathcal{R}^n\) and \(\boldsymbol{y}\in\mathcal{R}^n\) with respective components \(x_i\) and \(y_i\), then \(\boldsymbol{z}:= \boldsymbol{x}-\boldsymbol{y}= \boldsymbol{x}+ (-\boldsymbol{y})\).
Note: \(\boldsymbol{x}\) and \(\boldsymbol{y}\) must of the same dimension.

2.1.3.6 Inner Product

Inner product between vectors \(\boldsymbol{x}\in\mathcal{R}^n\) and \(\boldsymbol{y}\in\mathcal{R}^n\) is defined as \[\langle\boldsymbol{x},\boldsymbol{y}\rangle := \boldsymbol{x}^T\boldsymbol{y}= \boldsymbol{y}^T\boldsymbol{x}= \Sigma_{i=1}^n x_iy_i.\]
Note: Inner product results in a scalar.

Here is a Python code computing inner product of vectors.

import numpy as np

X = np.array([1, 2, 3, 4, 5])
Y = np.array([6, 7, 8, 9, 10])
print("Inner product between X and Y: ", X.dot(Y))
print("Inner product between Y and X: ", Y.dot(X)) # Should get the same result.
Inner product between X and Y:  130
Inner product between Y and X:  130

2.1.3.7 Outer Product

Outer product between two vectors \(\boldsymbol{x}\in\mathcal{R}^n\) and \(\boldsymbol{y}\in\mathcal{R}^m\) is defined as \[ \boldsymbol{x}\boldsymbol{y}^T = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \begin{bmatrix}y_1 & y_2 & \cdots & y_n\end{bmatrix} = \begin{bmatrix} x_1y_1 & x_1y_2 & \cdots & x_1y_m\\ x_2y_1 & x_2y_2 & \cdots & x_2 y_m \\ \vdots & \vdots & & \vdots \\ x_n y_1 & x_n y_2 & \cdots & x_n y_m \end{bmatrix} \] Note: Clearly \(\boldsymbol{x}\boldsymbol{y}^T \neq \boldsymbol{y}\boldsymbol{x}^T\).
Note: Outer product result in a matrix.

Here is a Python code computing outer product of vectors.

import numpy as np

X = np.array([1, 2, 3, 4, 5])
Y = np.array([6, 7, 8, 9, 10])
print("Outer product between X and Y: \n", np.multiply.outer(X, Y))
print("\nOuter product between Y and X: \n", np.multiply.outer(Y, X))
Outer product between X and Y: 
 [[ 6  7  8  9 10]
 [12 14 16 18 20]
 [18 21 24 27 30]
 [24 28 32 36 40]
 [30 35 40 45 50]]

Outer product between Y and X: 
 [[ 6 12 18 24 30]
 [ 7 14 21 28 35]
 [ 8 16 24 32 40]
 [ 9 18 27 36 45]
 [10 20 30 40 50]]

2.1.4 Length of a Vector

The length of a vector \(\boldsymbol{x}\in\mathcal{R}^n\) is denoted by \[\|\boldsymbol{x}\|_2 := \sqrt{x_1^2 + \cdots + x_n^2} = \sqrt{\boldsymbol{x}^T\boldsymbol{x}}.\] Therefore, the length of a vector is the square-root of the inner product with itself. Often we will drop the subscript \(2\) in \(\|\boldsymbol{x}\|_2\), when it is clear from context.

Here is a Python code computing length of a vector.

import numpy as np

X = np.array([1, 2, 3, 4, 5])
print("Length of vector X is: \n", np.sqrt(X.dot(X)))
Length of vector X is: 
 7.416198487095663

2.1.5 Vector Norms

A norm is a function \(\|\cdot\|: \mathcal{V} \rightarrow \mathcal{R}\) from a vector space \(\mathcal{V}\) over a field (typically \(\mathcal{R}^n\) or \(\mathcal{C}^n\)) to the non-negative real numbers. It satisfies the following properties:

  1. Non-negativity: \[\forall \boldsymbol{x}\in \mathcal{V}, \|\boldsymbol{x}\| \geq 0 \text{ and } \|\boldsymbol{x}\| = 0 \iff \boldsymbol{x}\text{ is the zero vector}.\]

  2. Scalar Multiplication: \[\forall \alpha \text{ (a scalar) and } \forall \boldsymbol{x}\in \mathcal{V}, \|\alpha \boldsymbol{x}\| = |\alpha| \|\boldsymbol{x}\|.\]

  3. Triangle Inequality: \[\forall \boldsymbol{x}, \boldsymbol{y}\in \mathcal{V}, \|\boldsymbol{x}+ \boldsymbol{y}\| \leq \|\boldsymbol{x}\| + \|\boldsymbol{y}\|.\]

  4. Definiteness: \[\|\boldsymbol{x}\| = 0 \iff \boldsymbol{x}\text{ is the zero vector}.\]

In general, we can define \[\|\boldsymbol{x}\|_p := \left(|x_1|^p + \cdots + |x_n|^p\right)^{\frac{1}{p}},\] which is the \(p^\text{th}\) norm of \(\boldsymbol{x}\).

We have for \(p=1,2,\infty\):

  • Manhattan Norm (\(l_1\) Norm): \[\|\boldsymbol{x}\|_1 = |x_1| + |x_2| + \cdots + |x_n|.\]

  • Euclidean Norm (\(l_2\) Norm): \[\|\boldsymbol{x}\|_2 = \sqrt{x_1^2 + x_2^2 + \cdots + x_n^2}.\]

  • Maximum Norm (\(l_\infty\) Norm): \[\|\boldsymbol{x}\|_{\infty} = \max(|x_1|, |x_2|, \cdots, |x_n|).\]

Here is a Python code computing various vector norms.

import numpy as np

# Define a vector as a NumPy array
vector = np.array([1, 2, 3])

# Compute different vector norms
L1_norm = np.linalg.norm(vector, ord=1)  # L1 norm (Manhattan norm)
L2_norm = np.linalg.norm(vector, ord=2)  # L2 norm (Euclidean norm)
inf_norm = np.linalg.norm(vector, ord=np.inf)  # Infinity norm (Maximum norm)

# Print the original vector and its norms
print("Vector:")
print(vector)

print(f"L1 Norm: {L1_norm}")
print(f"L2 Norm: {L2_norm}")
print(f"Infinity Norm: {inf_norm}")
Vector:
[1 2 3]
L1 Norm: 6.0
L2 Norm: 3.7416573867739413
Infinity Norm: 3.0

2.2 Matrices

Matrices are two-dimensional arrangement of mathematical objects, such as real-numbers, complex numbers, functions, etc.

An \(m\times n\) matrix is defined as \[ \boldsymbol{A}= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & & \vdots,\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}, \] where \(a_{ij}\) is the matrix element in the \(i^\text{th}\) row and the \(j^\text{th}\) column. The bold-faced symbol \(\boldsymbol{A}\) is used to refer to the entire matrix.

2.2.1 Dimension

The dimension of a matrix is defined by the number of rows and the number of columns. A matrix with dimension \(m\times n\) has \(m\) rows and \(n\) columns. We will use the notation \(\boldsymbol{A}\in \mathcal{R}^{m\times n}\) to represent \(m\times n\) vectors of real numbers.

A matrix of dimension \(m\times n\) is called square if \(m=n\).

2.2.2 Transpose

Matrix transpose is denote by \(\boldsymbol{A}^T\), which is defined by interchanging the rows and columns. For example,

\[ \boldsymbol{A}= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & & \vdots,\\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}, \text{ then } \boldsymbol{A}^T = \begin{bmatrix} a_{11} & a_{21} & \cdots & a_{m1}\\ a_{12} & a_{22} & \cdots & a_{m2}\\ \vdots & \vdots & & \vdots,\\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix}. \]

That is, if \(\boldsymbol{B}=\boldsymbol{A}^T\), then \(b_{ij} = a_{ji}\).

Therefore, if \(\boldsymbol{A}\) is an \(m\times n\) matrix, then \(\boldsymbol{A}^T\) is an \(n\times m\) matrix.

Here is a Python code computing matrix transpose.

import numpy as np

# Define a matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Compute the transpose
transpose_matrix = np.transpose(matrix)

# Alternatively, you can use the T attribute to compute the transpose
# transpose_matrix = matrix.T

# Print the original matrix and its transpose
print("Matrix:")
print(matrix)

print("Transpose:")
print(transpose_matrix)
Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Transpose:
[[1 4 7]
 [2 5 8]
 [3 6 9]]

2.2.3 Matrix Construction from Vectors

Given \(n\) vectors \(\boldsymbol{x}_i\), for \(i=1,\cdots,n\), each of dimension \(m\); we can construct an \(m\times n\) matrix as \[\boldsymbol{A}= \begin{bmatrix}\boldsymbol{x}_1 & \boldsymbol{x}_2 & \cdots & \boldsymbol{x}_n\end{bmatrix},\] and \[\boldsymbol{A}^T = \begin{bmatrix}\boldsymbol{x}^T_1 \\ \boldsymbol{x}^T_2 \\ \vdots \\ \boldsymbol{x}^T_n\end{bmatrix}.\]

Here is a Python code to construct matrices from vectors.

import numpy as np

# Define two vectors as NumPy arrays
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])

# Construct a matrix by stacking vectors horizontally (as rows)
matrix_horizontal = np.hstack((vector1.reshape(-1, 1), vector2.reshape(-1, 1)))

# Construct a matrix by stacking vectors vertically (as columns)
matrix_vertical = np.vstack((vector1, vector2))

# Print the original vectors and the constructed matrices
print("Vector 1:")
print(vector1)

print("Vector 2:")
print(vector2)

print("Matrix Constructed Horizontally (as rows):")
print(matrix_horizontal)

print("Matrix Constructed Vertically (as columns):")
print(matrix_vertical)
Vector 1:
[1 2 3]
Vector 2:
[4 5 6]
Matrix Constructed Horizontally (as rows):
[[1 4]
 [2 5]
 [3 6]]
Matrix Constructed Vertically (as columns):
[[1 2 3]
 [4 5 6]]

2.2.4 Rank of a Matrix

The rank of a matrix is a fundamental concept in linear algebra and represents the maximum number of linearly independent rows or columns in the matrix. In other words, it quantifies the dimensionality of the vector space spanned by the rows or columns of the matrix. A matrix’s rank can provide important information about its properties and relationships between its rows and columns.

A matrix is said to have full rank if its rank equals the largest possible for a matrix of the same dimensions, which is the lesser of the number of rows and columns. A matrix is said to be rank-deficient if it does not have full rank. The rank deficiency of a matrix is the difference between the lesser of the number of rows and columns, and the rank.

The rank of a matrix can be determined by various methods, including row reduction (Gaussian elimination) and by counting the number of non-zero rows or columns in its reduced row echelon form (RREF).

Here is a Python code computing rank of a matrix

import numpy as np

# Define a matrix 
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Compute the rank of the matrix
rank = np.linalg.matrix_rank(matrix)

# Print the matrix and its rank
print("Matrix:")
print(matrix)

print(f"Rank: {rank}")
Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Rank: 2

2.2.5 Eigen-Values and Eigen-Vectors of a Matrix

Eigenvalues, also known as characteristic values or latent roots, are a fundamental concept in linear algebra and matrix theory. They are associated with square matrices and play a crucial role in various applications, including physics, engineering, data analysis, and more.

An eigenvalue of a square matrix \(\boldsymbol{A}\) is a scalar (a single number) \(\lambda\) such that when \(\boldsymbol{A}\) is multiplied by a certain nonzero vector \(\boldsymbol{v}\), the result is a scaled version of \(\boldsymbol{v}\):

\[ \boldsymbol{A}\boldsymbol{v} = \lambda\boldsymbol{v}.\]

In this equation:

  • \(\boldsymbol{A}\) is the square matrix.
  • \(\boldsymbol{v}\) is a nonzero vector called the eigenvector associated with the eigenvalue \(\lambda\).
  • \(\lambda\) is the eigenvalue.

In simpler terms, when you multiply a matrix by its eigenvector, the result is a vector that points in the same direction as the original eigenvector, but possibly with a different magnitude (scaled by \(\lambda\)).

To find the eigenvalues of a matrix in practice, we typically use numerical methods or specialized libraries like NumPy in Python. The eigenvalues can be computed using functions like numpy.linalg.eigvals in Python, which returns an array of eigenvalues for a given matrix.

Here is a Python code computing the eigenvalues of a matrix.

import numpy as np

# Define a square matrix
matrix = np.array([[1, 2],
                   [3, 4]])

# Compute the eigenvalues
eigenvalues = np.linalg.eigvals(matrix)

# Print the eigenvalues
print("Eigenvalues:")
print(eigenvalues)
Eigenvalues:
[-0.37228132  5.37228132]

2.2.6 Operations

2.2.6.1 Negative Matrix

If \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\) is a matrix with elements \(a_{ij}\), then \(-\boldsymbol{A}\) is defined by elements \(-a_{ij}\).

2.2.6.2 Multiplication by a Scalar

If \(\boldsymbol{A}\) is a matrix defined by elements \(a_{ij}\), then \(\alpha\boldsymbol{A}\) is defined by the elements \(\alpha a_{ij}\).

2.2.6.3 Matrix Equality

Matrix equality is defined only for two matrices with the same dimension, and is defined elementwise. For two matrics \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\) and \(\boldsymbol{B}\in\mathcal{R}^{m\times n}\), the condition \(\boldsymbol{A}=\boldsymbol{B}\) is equivalent to \(mn\) conditions \(a_{ij} = b_{ij}\), for \(i=1,\cdots,m\) and \(j=1,\cdots,n\).

2.2.6.4 Matrix Addition

Addition of two matrices \(\boldsymbol{A}\) and \(\boldsymbol{B}\) is elementwise, and is only defined if \(\boldsymbol{A}\) and \(\boldsymbol{B}\) have the same dimension. If \(\boldsymbol{C}=\boldsymbol{A}+\boldsymbol{B}\), then \(c_{ij} = a_{ij} + b_{ij}\), for \(i=1,\cdots,m\) and \(j=1,\cdots,n\).

2.2.6.5 Matrix Product

Product between two matrices \(\boldsymbol{A}\) and \(\boldsymbol{B}\) is denoted by \(\boldsymbol{A}\boldsymbol{B}\). If \(\boldsymbol{C}= \boldsymbol{A}\boldsymbol{B}\), then the elements of \(\boldsymbol{C}\) are defined by \[c_{ij} = \sum_{k=1}^p a_{ik}b_{kj},\] where the dimensions \(\boldsymbol{A}\) and \(\boldsymbol{B}\) must be compatible. The matrix product \(\boldsymbol{A}\boldsymbol{B}\) is defined only if the dimension of \(\boldsymbol{A}\) is \(m\times p\) and the dimension of \(\boldsymbol{B}\) is \(p\times n\), i.e. \(\boldsymbol{A}\) must have the same number of columns as the number of rows of \(\boldsymbol{B}\). In that case, the matrix \(\boldsymbol{C}=\boldsymbol{A}\boldsymbol{B}\) has dimension \(m \times n\).

Matrix product satisfies the following conditions: \[\begin{align*} &\alpha\boldsymbol{A}= \boldsymbol{A}\alpha, \text{ where $\alpha$ is a scalar};\\ &\boldsymbol{A}\boldsymbol{B}\neq \boldsymbol{B}\boldsymbol{A}, \\ &\alpha(\boldsymbol{A}\boldsymbol{B}) = (\alpha\boldsymbol{A})\boldsymbol{B}= \boldsymbol{A}(\alpha\boldsymbol{B}) = (\boldsymbol{A}\boldsymbol{B})\alpha,\\ &\boldsymbol{A}(\boldsymbol{B}+\boldsymbol{C}) = \boldsymbol{A}\boldsymbol{B}+ \boldsymbol{A}\boldsymbol{C},\\ &(\boldsymbol{A}\boldsymbol{B})^T = \boldsymbol{B}^T\boldsymbol{A}^T. \end{align*}\]

Here is Python code showing matrix multiplication.

import numpy as np

# Define two matrices
matrix_A = np.array([[1, 2],
                     [3, 4]])

matrix_B = np.array([[5, 6],
                     [7, 8]])

# Method 1: Using numpy.dot
result_dot = np.dot(matrix_A, matrix_B)

# Method 2: Using the @ operator
result_operator = matrix_A @ matrix_B

# Print the original matrices and their multiplication results
print("Matrix A:")
print(matrix_A)

print("Matrix B:")
print(matrix_B)

print("Matrix Multiplication using numpy.dot:")
print(result_dot)

print("Matrix Multiplication using @ operator:")
print(result_operator)
Matrix A:
[[1 2]
 [3 4]]
Matrix B:
[[5 6]
 [7 8]]
Matrix Multiplication using numpy.dot:
[[19 22]
 [43 50]]
Matrix Multiplication using @ operator:
[[19 22]
 [43 50]]

2.2.7 Symmetric Matrices

A symmetric matrix is a square matrix that satisfies the condition \(\boldsymbol{A}= \boldsymbol{A}^T\), i.e. \(a_{ij} = a_{ji}\). We often denote the space of symmetric matrices as \(\mathcal{S}^n\), representing \(n\times n\) symmetric matrices.

Note: Eigen values of a symmetric matrix are all real.

Here is a Python code to generate a symmetric matrix.

import numpy as np

# Specify the size of the symmetric matrix (e.g., 3x3)
matrix_size = 3

# Generate a random square matrix
random_matrix = np.random.rand(matrix_size, matrix_size)

# Make it symmetric by copying the lower triangular part to the upper triangular part
symmetric_matrix = (random_matrix + random_matrix.T) / 2

# Print the symmetric matrix
print("Symmetric Matrix:")
print(symmetric_matrix)

# Compute the eigenvalues
eigenvalues = np.linalg.eigvals(symmetric_matrix)

# All eigen values should be real
print("Eigen values:", eigenvalues)

# Check if all elements are real
are_all_real = np.isreal(eigenvalues).all()

# Assert that all elements are real
assert are_all_real, "All eigen values are not real."

# If the assertion passes, it means all elements are real
print("All eigen values are real.")
Symmetric Matrix:
[[0.98021467 0.58854248 0.24022109]
 [0.58854248 0.57669179 0.59537668]
 [0.24022109 0.59537668 0.25981544]]
Eigen values: [ 1.62835128  0.41725829 -0.22888766]
All eigen values are real.

2.2.8 Skew Symmetric Matrices

A skew-symmetric matrix is a square matrix and satisfies the condition \(\boldsymbol{A}= -\boldsymbol{A}^T\), i.e. \(a_{ij} = -a_{ji}\). The diagonal elements of skew-symmetric matrices are always zero, because \(a_{ii} = -a_{ii}\) only when \(a_{ii} = 0\).

Here is a Python code to generate a skew-symmetric matrix

import numpy as np

# Specify the size of the skew-symmetric matrix (e.g., 3x3)
matrix_size = 3

# Generate a random square matrix
random_matrix = np.random.rand(matrix_size, matrix_size)

# Make it skew-symmetric by subtracting its transpose
skew_symmetric_matrix = random_matrix - random_matrix.T

# Print the skew-symmetric matrix
print("Skew-Symmetric Matrix:")
print(skew_symmetric_matrix)
Skew-Symmetric Matrix:
[[ 0.          0.27010864  0.32163342]
 [-0.27010864  0.          0.41636305]
 [-0.32163342 -0.41636305  0.        ]]

2.2.9 Positive (Semi) Definite Matrices

  • A symmetric matrix \(\boldsymbol{A}\) is positive semi-definite if the eigen values are greate than or equal to zero, i.e. \(\lambda_i(\boldsymbol{A}) \geq 0\), where \(\lambda_i(\boldsymbol{A})\) is the \(i^\text{th}\) eigen value of \(\boldsymbol{A}\). We denote positive semi-definite matrices as \(\boldsymbol{A}\ge 0\). This doesn’t mean element-wise positiveness of the matrix. The space of \(n\times n\) positive semi-definite matrices are denoted by \(\mathcal{S}^n_+\).

  • Positive definite matrices are symmetric matrices whose eigen values are strictly greater than zero. i.e. \(\lambda_i(\boldsymbol{A}) > 0\). We denote positive definite matrices as \(\boldsymbol{A}> 0\). This doesn’t mean element-wise positiveness of the matrix. The space of \(n\times n\) positive definite matrices are denoted by \(\mathcal{S}^n_{++}\).

  • For two matrices \(\boldsymbol{X}\in\mathcal{S}^n\) and \(\boldsymbol{Y}\in\mathcal{S}^n\), the matrix inequality \(\boldsymbol{X}\geq \boldsymbol{Y}\) means \(\boldsymbol{Z}:=\boldsymbol{X}-\boldsymbol{Y}\) is postive semi-definite, i.e. \(\boldsymbol{Z}\geq 0\) or \(\lambda_i(\boldsymbol{Z}) \geq 0\).

Here is a Python code generating a positive definite matrix.

import numpy as np

# Specify the size of the positive definite matrix (e.g., 3x3)
matrix_size = 3

# Generate a random symmetric matrix
random_matrix = np.random.rand(matrix_size, matrix_size)
symmetric_matrix = (random_matrix + random_matrix.T) / 2

# Ensure the matrix has positive eigenvalues
eigenvalues = np.linalg.eigvals(symmetric_matrix)
while not all(eigenvalues > 0):
    random_matrix = np.random.rand(matrix_size, matrix_size)
    symmetric_matrix = (random_matrix + random_matrix.T) / 2
    eigenvalues = np.linalg.eigvals(symmetric_matrix)

# Print the positive definite matrix
print("Positive Definite Matrix:")
print(symmetric_matrix)

# Compute the eigenvalues
eigenvalues = np.linalg.eigvals(symmetric_matrix)

# All eigen values should be real and positive
print("Eigen values:", eigenvalues)
Positive Definite Matrix:
[[0.83171201 0.14863323 0.361245  ]
 [0.14863323 0.73692737 0.7212256 ]
 [0.361245   0.7212256  0.89544775]]
Eigen values: [1.69985981 0.69881599 0.06541134]

2.2.10 Identity Matrix

An \(n\times n\) identity matrix is denoted by \(\boldsymbol{I}_n\), and is a matrix with all off-diagonal elements equal to zero, and all diagonal elements equal to one. For example, a \(3\times 3\) identity matrix is defined as \[ \boldsymbol{I}_3 = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \] For an \(m\times n\) matrix \(\boldsymbol{M}\), the following are true: \[ \boldsymbol{M}\boldsymbol{I}_n = \boldsymbol{I}_{m}\boldsymbol{M} = \boldsymbol{M}. \]

Here is a Python code computing the identity matrix.

import numpy as np

# Specify the size of the identity matrix (e.g., a 3x3 identity matrix)
matrix_size = 3

# Compute the identity matrix of the specified size
identity_matrix = np.eye(matrix_size)

# Print the identity matrix
print("Identity Matrix:")
print(identity_matrix)
Identity Matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

2.2.11 Determininant of a Matrix

The determinant of a square matrix \(\boldsymbol{A}\) is denoted by \(|\boldsymbol{A}|\).

For a \(2\times 2\) matrix \[ \boldsymbol{A}= \begin{bmatrix}a&b\\c&d\end{bmatrix}, \] the determinant is defined by \[|\boldsymbol{A}| = ad-bc.\]

For a \(3\times 3\) matrix, \[\boldsymbol{B}= \begin{bmatrix}a & b & c \\ d& e& f\\ g & h & i \end{bmatrix},\] the determinant is defined by \[ |\boldsymbol{B}| = a\left|\begin{matrix}e&f\\h&i\end{matrix}\right| - b\left|\begin{matrix}d&f\\g&i\end{matrix}\right| + c\left|\begin{matrix}d&e\\g&h\end{matrix}\right| = a(ei-hf) -b(di-gf) + c(dh-ge). \]

Here is a Python code computing matrix determinant.

import numpy as np

# Define a square matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Compute the determinant
determinant = np.linalg.det(matrix)

# Print the original matrix and its determinant
print("Matrix:")
print(matrix)

print(f"Determinant: {determinant}")
Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Determinant: -9.51619735392994e-16

2.2.12 Matrix Inverse

Matrix inverse is defined only for square matrices and is denoted by \(\boldsymbol{M}^{-1}\). For an \(n\times n\) matrix \(\boldsymbol{M}\), \(\boldsymbol{M}^{-1}\) satisfies the following condition \[ \boldsymbol{M}\boldsymbol{M}^{-1} = \boldsymbol{M}^{-1}\boldsymbol{M} = \boldsymbol{I}_n. \]

Note: Matrix inverse exists only when the determinant is non zero.

For a \(2\times 2\) matrix \[ \boldsymbol{A}= \begin{bmatrix}a&b\\c&d\end{bmatrix}, \] the inverse is given by \[\boldsymbol{A}^{-1} = \frac{1}{|\boldsymbol{A}|}\begin{bmatrix}a&-c\\-b&d\end{bmatrix},\] assuming \(|\boldsymbol{A}|\neq 0\).

\[\begin{align*} & (\boldsymbol{A}^{-1})^{-1} = \boldsymbol{A},\\ &(\alpha\boldsymbol{A})^{-1} = \frac{1}{\alpha}\boldsymbol{A}^{-1}, \text{ for $\alpha \neq 0$},\\ &(\boldsymbol{A}^T)^{-1} = (\boldsymbol{A}^{-1})^{T},\\ &(\boldsymbol{A}\boldsymbol{B})^{-1} = \boldsymbol{B}^{-1}\boldsymbol{A}^{-1}, \text{ assuming $\boldsymbol{A}$ and $\boldsymbol{B}$ are invertible};\\ &|\boldsymbol{A}^{-1}| = \frac{1}{|\boldsymbol{A}|}. \end{align*}\]

Here is a Python code computing matrix inverse.

import numpy as np

# Define a matrix
matrix = np.array([[1, 2],
                   [3, 4]])

# Compute the inverse
matrix_inverse = np.linalg.inv(matrix)

# Print the original matrix and its inverse
print("Matrix:")
print(matrix)

print("Inverse:")
print(matrix_inverse)
Matrix:
[[1 2]
 [3 4]]
Inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]

2.2.13 Moore-Penrose Inverse

For a matrix \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\), the Moore-Penrose inverse (or pseudo-inverse) is denoted by \(\boldsymbol{A}^+\) and defined as \[ \boldsymbol{A}^+ := (\boldsymbol{A}^T\boldsymbol{A})^{-1}A^T \]

Some properties of \(\boldsymbol{A}^+\) include:

  1. For any matrix \(\boldsymbol{A}\) there is one and only one pseudoinverse \(\boldsymbol{A}^+\).
  2. If \(\boldsymbol{A}\) is invertible, then \(\boldsymbol{A}^{-1} = \boldsymbol{A}^+\)
  3. For \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\), left inverse is defined as \(\boldsymbol{A}^+ = (\boldsymbol{A}^T\boldsymbol{A})^{-1}\boldsymbol{A}^T\), and \(\boldsymbol{A}^+\boldsymbol{A}= \boldsymbol{I}_n\).
  4. For \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\), right inverse is defined as \(\boldsymbol{A}^+ = \boldsymbol{A}^T(\boldsymbol{A}^T\boldsymbol{A})^{-1}\), and \(\boldsymbol{A}\boldsymbol{A}^+= \boldsymbol{I}_m\).

Here is a Python code computing Moore-Penrose (or pseudo) inverse

import numpy as np

# Define a matrix
matrix = np.array([[1, 2],
                   [3, 4]])

# Compute the pseudo-inverse
pseudo_inverse = np.linalg.pinv(matrix)

# Print the original matrix and its pseudo-inverse
print("Matrix:")
print(matrix)

print("Pseudo-Inverse:")
print(pseudo_inverse)
Matrix:
[[1 2]
 [3 4]]
Pseudo-Inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]

2.2.14 Orthonormal Matrices

An \(n\times n\) square matrix satisfying the condition \(\boldsymbol{A}\boldsymbol{A}^T = \boldsymbol{I}_n\) is orthogonal or orthonormal. Also, if \(\boldsymbol{A}\boldsymbol{A}^T = \boldsymbol{I}_n\), then \(\boldsymbol{A}^T\boldsymbol{A}= \boldsymbol{I}_n\).

Here is a Python code checking if a matrix is orthogonal.

import numpy as np

# Define a square matrix 
matrix = np.array([[1, 0],
                   [0, -1]])

# Check if the matrix is square
if matrix.shape[0] == matrix.shape[1]:
    # Compute the product of the matrix and its transpose
    product_matrix_transpose = np.dot(matrix, matrix.T)
    
    # Check if the product is close to the identity matrix within a tolerance
    is_orthonormal = np.allclose(product_matrix_transpose, np.eye(matrix.shape[0]))
    
    if is_orthonormal:
        print("The matrix is orthonormal.")
    else:
        print("The matrix is not orthonormal.")
else:
    print("The matrix is not square. Orthonormality is not defined.")
The matrix is orthonormal.

2.2.15 Kronecker Product

Kronecker product between two matrices \(\boldsymbol{A}\in\mathcal{R}^{p\times q}\) and \(\boldsymbol{B}\in\mathcal{R}^{m\times n}\) results in a new matrix \(\boldsymbol{C}\in\mathcal{R}^{pm\times qn}\) defined by \[ \boldsymbol{C}:= \boldsymbol{A}\otimes\boldsymbol{B}= \begin{bmatrix}a_{11}\boldsymbol{B}& a_{12}\boldsymbol{B}& \cdots & a_{1n}\boldsymbol{B}\\ a_{21}\boldsymbol{B}& a_{22}\boldsymbol{B}& \cdots & a_{2n}\boldsymbol{B}\\ \vdots & \vdots& & \vdots \\ a_{m1}\boldsymbol{B}& a_{m2}\boldsymbol{B}& \cdots & a_{mn}\boldsymbol{B} \end{bmatrix}. \]

Kronecker products have the following properties

  1. \(\boldsymbol{A}\otimes(\boldsymbol{B}+\boldsymbol{C}) = \boldsymbol{A}\otimes\boldsymbol{B}+ \boldsymbol{A}\otimes\boldsymbol{C}\)
  2. \((\boldsymbol{B}+\boldsymbol{C})\otimes\boldsymbol{A}= \boldsymbol{B}\otimes\boldsymbol{A}+ \boldsymbol{C}\otimes\boldsymbol{A}\)
  3. \((\alpha\boldsymbol{A})\otimes\boldsymbol{B}= \boldsymbol{A}\otimes(\alpha\boldsymbol{B}) = \alpha(\boldsymbol{A}\otimes\boldsymbol{B})\)
  4. \((\boldsymbol{A}\otimes\boldsymbol{B})\otimes\boldsymbol{C}= \boldsymbol{A}\otimes(\boldsymbol{B}\otimes\boldsymbol{C})\)
  5. \(\boldsymbol{A}\otimes\boldsymbol{0} = \boldsymbol{0}\otimes\boldsymbol{A}= \boldsymbol{0}\)
  6. \((\boldsymbol{A}\otimes\boldsymbol{B})(\boldsymbol{C}\otimes\boldsymbol{D}) = (\boldsymbol{A}\boldsymbol{C})\otimes(\boldsymbol{B}\boldsymbol{D})\)
  7. \((\boldsymbol{A}\otimes\boldsymbol{B})^{-1} = \boldsymbol{A}^{-1}\otimes\boldsymbol{B}^{-1}\)
  8. \((\boldsymbol{A}\otimes\boldsymbol{B})^{+} = \boldsymbol{A}^{+}\otimes\boldsymbol{B}^{+}\)
  9. \((\boldsymbol{A}\otimes\boldsymbol{B})^T = \boldsymbol{A}^T\otimes\boldsymbol{B}^T\)
  10. \(\textbf{det}\left(\boldsymbol{A}\otimes\boldsymbol{B}\right) = \textbf{det}\left(\boldsymbol{A}\right)^m\textbf{det}\left(\boldsymbol{B}\right)^n\), for \(\boldsymbol{A}\in\mathcal{R}^{n\times n}\) and \(\boldsymbol{B}\in\mathcal{R}^{m\times m}\).
  11. For \(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}= \boldsymbol{C}\), \[\textbf{vec}\left(\boldsymbol{C}\right) = \textbf{vec}\left(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B}\right) = (\boldsymbol{B}^T\otimes\boldsymbol{A})\textbf{vec}\left(\boldsymbol{X}\right),\] where \(\textbf{vec}\left(\cdot\right)\) is the vectorization operator that vertically stacks the columns of a matrix.
  12. \(\textbf{rank}\left(\boldsymbol{A}\otimes\boldsymbol{B}\right) = \textbf{rank}\left(\boldsymbol{A}\right)\textbf{rank}\left(\boldsymbol{B}\right)\)

Here is a Python code computing Kronecker product.

import numpy as np

# Define two matrices
matrix_A = np.array([[1, 2],
                     [3, 4]])

matrix_B = np.array([[0, 5],
                     [6, 7]])

# Compute the Kronecker product
kron_product = np.kron(matrix_A, matrix_B)

# Print the original matrices and the Kronecker product
print("Matrix A:")
print(matrix_A)

print("Matrix B:")
print(matrix_B)

print("Kronecker Product:")
print(kron_product)
Matrix A:
[[1 2]
 [3 4]]
Matrix B:
[[0 5]
 [6 7]]
Kronecker Product:
[[ 0  5  0 10]
 [ 6  7 12 14]
 [ 0 15  0 20]
 [18 21 24 28]]

2.2.16 Trace of a Square Matrix

For \(\boldsymbol{A}\in\mathcal{R}^{n\times n}\), the trace of the matrix is denoted by \(\textbf{tr}\left[\boldsymbol{A}\right]\) and is defined as the sum of the diagonal terms, i.e. \[ \textbf{tr}\left[\boldsymbol{A}\right] = \sum_i^n a_{ii}. \]

The trace operator has the following properties:

  1. It is a linear operator, i.e. \[\textbf{tr}\left[\boldsymbol{A}+\boldsymbol{B}\right] = \textbf{tr}\left[\boldsymbol{A}\right] + \textbf{tr}\left[\boldsymbol{B}\right].\]
  2. Trace of a scalar-matrix product \[\textbf{tr}\left[\alpha \boldsymbol{A}\right] = \alpha\textbf{tr}\left[\boldsymbol{A}\right].\]
  3. Trace of a matrix-matrix product between compatible matrices \[\textbf{tr}\left[\boldsymbol{A}^T\boldsymbol{B}\right] = \textbf{tr}\left[A\boldsymbol{B}^T\right] = \textbf{tr}\left[\boldsymbol{B}^T\boldsymbol{A}\right] = \textbf{tr}\left[\boldsymbol{B}\boldsymbol{A}^T\right] = \sum_{i=1}^m\sum_{j=1}^n a_{ij}b_{ij}.\]
  4. If \(\boldsymbol{A}\) and \(\boldsymbol{B}\) are positive semi-definite matrices of the same size, then the following inequality holds \[ 0 \leq \textbf{tr}\left[\boldsymbol{A}\boldsymbol{B}\right]^2 \leq \textbf{tr}\left[\boldsymbol{A}^2\right]\textbf{tr}\left[\boldsymbol{B}^2\right]\leq \textbf{tr}\left[\boldsymbol{A}\right]^2\textbf{tr}\left[\boldsymbol{B}\right]^2. \]
  5. Trace of Kronecker Product \[\textbf{tr}\left[\boldsymbol{A}\otimes\boldsymbol{B}\right] = \textbf{tr}\left[\boldsymbol{A}\right]\textbf{tr}\left[\boldsymbol{B}\right].\]
  6. Trace and Eigen Values \[\textbf{tr}\left[A\right] = \sum_{i=1}^n\lambda_i(\boldsymbol{A}).\]

Here is a Python code computing trace of a matrix.

import numpy as np

# Define a matrix as a 2D NumPy array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Compute the trace of the matrix
trace = np.trace(matrix)

# Print the matrix and its trace
print("Matrix:")
print(matrix)
print(f"Trace: {trace}")
Matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Trace: 15

2.2.17 Matrix Inner Product

The Frobenius inner product between two matrices is defined as \[ \langle \boldsymbol{A},\boldsymbol{B}\rangle := \textbf{tr}\left[\boldsymbol{A}^T\boldsymbol{B}\right]. \]

Here is a Python code checking if two matrices are orthogonal.

import numpy as np

# Define two square matrices 
matrix_A = np.array([[1, 0],
                     [0, -1]])

matrix_B = np.array([[0, 1],
                     [-1, 0]])

product_AB_transpose = np.dot(matrix_A, matrix_B.T)

# If trace(A'*B) = 0 , A and B are orthogonal. 
print("trace(A'*B) =",np.trace(np.dot(matrix_A.T, matrix_B)))

if np.allclose(np.trace(np.dot(matrix_A.T, matrix_B)),0):
  print("Matrix A and Matrix B are orthogonal")
else:
  print("Matrix A and Matrix B are not orthogonal")   
trace(A'*B) = 0
Matrix A and Matrix B are orthogonal

2.2.18 Matrix Norms

For a matrix \(\boldsymbol{A}\in\mathcal{R}^{m\times n}\) defined by elements \(a_{ij}\),

  • \(\|\boldsymbol{A}\|_1 := \max_{1\leq j \leq n} \sum_{i=1}^m |a_{ij}|\), which is the maximum absolution column sum of the matrix.
  • \(\|\boldsymbol{A}\|_2 := \sqrt{\lambda_\text{max}(\boldsymbol{A}^T\boldsymbol{A})} = \sigma_\text{max}(\boldsymbol{A})\), where \(\sigma_\text{max}(\boldsymbol{A})\) is the largest singular value of \(\boldsymbol{A}\).
  • \(\|\boldsymbol{A}\|_\infty := \max_{1\leq i \leq m} \sum_{j=1}^n |a_{ij}|\), which is the maximum absolution row sum of the matrix.

Here is a Python code computing matrix norms.

import numpy as np
from numpy.linalg import norm

# Define a matrix
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Compute different norms
frobenius_norm = norm(A, 'fro') # Frobenius norm
l1_norm = norm(A, 1)           # L1 norm (maximum column sum)
l2_norm = norm(A, 2)           # L2 norm (largest singular value)
max_norm = norm(A, np.inf)     # Max (or infinity) norm (maximum row sum)

# Display the results
print("Matrix A:\n", A)
print("Frobenius Norm of A:", frobenius_norm)
print("L1 Norm of A:", l1_norm)
print("L2 Norm of A:", l2_norm)
print("Max Norm of A:", max_norm)
Matrix A:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Frobenius Norm of A: 16.881943016134134
L1 Norm of A: 18.0
L2 Norm of A: 16.84810335261421
Max Norm of A: 24.0

2.2.19 Singular Values of Matrix

Given a matrix \(\boldsymbol{A}\) of size \(m \times n\), its singular values are defined as the square roots of the eigenvalues of the matrix \(\boldsymbol{A}^T\boldsymbol{A}\) (or \(\boldsymbol{A}\boldsymbol{A}^T\)), i.e. \[ \sigma_i(\boldsymbol{A}) := \sqrt{\lambda_i(\boldsymbol{A}^T\boldsymbol{A})}. \]

Singular values provide insight into the geometric properties of the matrix, such as its rank, range, and null space. They are always non-negative real numbers and are usually presented in descending order.

The singular values tell us about the ‘stretching’ effect of the matrix transformation in various directions. In the context of SVD, a matrix \(\boldsymbol{A}\) can be decomposed as \(\boldsymbol{A}= \boldsymbol{U}\boldsymbol{\Sigma} \boldsymbol{V}^T\), where \(\boldsymbol{U}\) and \(\boldsymbol{V}\) are orthogonal matrices, and \(\boldsymbol{\Sigma}\) is a diagonal matrix whose diagonal entries are the singular values of \(\boldsymbol{A}\).

Here’s a Python script using NumPy to compute the singular values of a given matrix:

import numpy as np

# Define a matrix
A = np.array([[1, 2], [3, 4], [5, 6]])

# Compute the singular values
U, singular_values, Vt = np.linalg.svd(A)

# Display the singular values
print("Singular Values of A:\n", singular_values)
Singular Values of A:
 [9.52551809 0.51430058]

Since \(\boldsymbol{A}^T\boldsymbol{A}\) (or \(\boldsymbol{A}\boldsymbol{A}^T\)) is positive semi-definite, its eigen values will be real and non zero. The largest and smallest singular values of a matrix hold significant information about the properties and behavior of the matrix in various mathematical and practical contexts. Understanding their importance is crucial in linear algebra, data analysis, and machine learning.

The largest singular value has several important implications, such as:

  1. Indicates Maximum Stretch: The largest singular value of a matrix represents the maximum “stretching” factor of the matrix. It gives the largest factor by which the matrix can stretch a vector during the transformation.

  2. Norm of the Matrix: The largest singular value is equal to the 2-norm (or spectral norm) of the matrix. This norm represents the maximum length to which the matrix can stretch a unit vector.

  3. Stability and Conditioning: In numerical linear algebra, the largest singular value is used to assess the conditioning of the matrix. A very large singular value, especially when compared to the smallest singular value, can indicate that the matrix is ill-conditioned, leading to numerical instability in solutions of linear systems involving the matrix.

The smallest singular value also has several important implications, such as:

  1. Indicates Minimum Stretch and Rank: The smallest singular value provides insight into how much the matrix compresses vectors in the least stretchable direction. If the smallest singular value is zero, it indicates that the matrix is rank-deficient (not full rank), meaning there are linearly dependent columns or rows.

  2. Measure of Invertibility: For a square matrix, if the smallest singular value is significantly greater than zero, the matrix is well-conditioned and invertible. A very small (especially zero) singular value in a square matrix implies that the matrix is near singular or singular, and thus not invertible.

  3. Pseudo-Inverse and Regularization: In machine learning and statistics, the smallest singular value plays a critical role in regularization and the computation of the Moore-Penrose pseudo-inverse. When dealing with ill-posed problems or in the presence of collinearity, small singular values are regularized to stabilize the solution.

  4. Data Analysis and Principal Component Analysis (PCA): In PCA, small singular values (and their corresponding singular vectors) often correspond to noise or less significant components of the data. Eliminating these components (dimensionality reduction) can help in focusing on the more significant patterns represented by larger singular values.

In summary, the largest singular value reflects the maximum stretching ability and overall norm of a matrix, while the smallest singular value indicates its invertibility, conditioning, and the presence of redundant or less significant components. In practical applications, these singular values are crucial for understanding the stability, efficiency, and effectiveness of numerical methods and data analysis techniques.

2.2.20 Matrix Decompositions

2.2.20.1 LU Decomposition

LU decomposition is a method of decomposing a square matrix \(\boldsymbol{A}\) into the product of a lower triangular matrix \(\boldsymbol{L}\) and an upper triangular matrix \(\boldsymbol{U}\). In mathematical terms, if \(\boldsymbol{A}\) is a square matrix, then the LU decomposition is given by \(\boldsymbol{A}= \boldsymbol{LU}\), where

  • \(\boldsymbol{L}\) is a lower triangular matrix (all entries above the main diagonal are zero),
  • \(\boldsymbol{U}\) is an upper triangular matrix (all entries below the main diagonal are zero).

LU decomposition is used in various applications such as:

  1. Solving Linear Systems: It is used to solve systems of linear equations. Once a matrix is decomposed into \(\boldsymbol{L}\) and \(\boldsymbol{U}\), it is easier to solve the equation \(\boldsymbol{Ax = b}\) by first solving \(\boldsymbol{Ly = b}\) for \(\boldsymbol{y}\) and then solving \(\boldsymbol{Ux = y}\) for \(\boldsymbol{x}\).

  2. Matrix Inversion: LU decomposition can be used to find the inverse of a matrix, which is significantly faster than direct computation, especially for large matrices.

  3. Determinant Calculation: The determinant of \(\boldsymbol{A}\) can be easily calculated as the product of the diagonals of \(\boldsymbol{L}\) and \(\boldsymbol{U}\), as the determinant of a triangular matrix is the product of its diagonal entries.

Here is a Python code computing LU Decomposition of a matrix.

import numpy as np
from scipy.linalg import lu

# Define a square matrix
A = np.array([[4, 3], [6, 3]])

# Perform LU Decomposition
P, L, U = lu(A)

# Display the results
print("Original Matrix:\n", A)
print("Permutation Matrix (P):\n", P)
print("Lower Triangular Matrix (L):\n", L)
print("Upper Triangular Matrix (U):\n", U)

# Verify the decomposition
print("Verification (P * L * U):\n", np.dot(P, np.dot(L, U)))
Original Matrix:
 [[4 3]
 [6 3]]
Permutation Matrix (P):
 [[0. 1.]
 [1. 0.]]
Lower Triangular Matrix (L):
 [[1.         0.        ]
 [0.66666667 1.        ]]
Upper Triangular Matrix (U):
 [[6. 3.]
 [0. 1.]]
Verification (P * L * U):
 [[4. 3.]
 [6. 3.]]

2.2.20.2 QR Factorization of a Matrix

QR factorization is a method for decomposing a matrix into two components: an orthogonal matrix \(\boldsymbol{Q}\) and an upper triangular matrix \(\boldsymbol{R}\). For a given matrix \(\boldsymbol{A}\), the QR factorization is expressed as \(\boldsymbol{A}= \boldsymbol{Q}\boldsymbol{R}\), where:

  • \(\boldsymbol{Q}\) is an orthogonal matrix, i.e., \(\boldsymbol{Q}^T \boldsymbol{Q} = \boldsymbol{Q} \boldsymbol{Q}^T = \boldsymbol{I}\), where \(\boldsymbol{I}\) is the identity matrix
  • \(\boldsymbol{R}\) is an upper triangular matrix.

QR factorization is utilized in various mathematical and computational applications, such as:

  1. Solving Linear Systems: Similar to LU decomposition, QR factorization can be used to solve linear systems \(\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}\). The system is solved by first computing \(\boldsymbol{Q}^T \boldsymbol{b}\) and then solving the upper triangular system \(\boldsymbol{R}\boldsymbol{x} = \boldsymbol{Q}^T \boldsymbol{b}\).
  2. Eigenvalue Computation: QR factorization is a key component in algorithms for computing eigenvalues of a matrix, particularly in the QR algorithm for eigenvalue computation.
  3. Least Squares Fitting: In statistical analysis and data fitting, QR factorization is often used to solve least squares problems efficiently, especially when \(\boldsymbol{A}\) is not square or is ill-conditioned.

Here is a Python code computing QR Decomposition of a matrix.

import numpy as np
from scipy.linalg import qr

# Define a matrix
A = np.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]])

# Perform QR Decomposition
Q, R = qr(A)

# Display the results
print("Original Matrix:\n", A)
print("Orthogonal Matrix (Q):\n", Q)
print("Upper Triangular Matrix (R):\n", R)

# Verify the decomposition
print("Verification (Q * R):\n", np.dot(Q, R))
Original Matrix:
 [[ 12 -51   4]
 [  6 167 -68]
 [ -4  24 -41]]
Orthogonal Matrix (Q):
 [[-0.85714286  0.39428571  0.33142857]
 [-0.42857143 -0.90285714 -0.03428571]
 [ 0.28571429 -0.17142857  0.94285714]]
Upper Triangular Matrix (R):
 [[ -14.  -21.   14.]
 [   0. -175.   70.]
 [   0.    0.  -35.]]
Verification (Q * R):
 [[ 12. -51.   4.]
 [  6. 167. -68.]
 [ -4.  24. -41.]]

2.2.20.3 Cholesky Factorization of a Matrix

Cholesky factorization is a method for decomposing a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. For a given matrix \(\boldsymbol{A}\), the Cholesky factorization is expressed as: \(\boldsymbol{A}= \boldsymbol{L}\boldsymbol{L}^T\), where \(\boldsymbol{L}\) is a lower triangular matrix.

Cholesky factorization is used in various applications, including:

  1. Solving Linear Systems: Cholesky factorization is particularly efficient for solving linear systems of the form \(\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}\) when \(\boldsymbol{A}\) is Hermitian and positive-definite. The system can be solved by first solving \(\boldsymbol{L}\boldsymbol{y} = \boldsymbol{b}\) for \(\boldsymbol{y}\), and then solving \(\boldsymbol{L}^T \boldsymbol{x} = \boldsymbol{y}\).

  2. Computing Matrix Inverses: For Hermitian, positive-definite matrices, Cholesky factorization can be used to compute the inverse of \(\boldsymbol{A}\) efficiently.

  3. Monte Carlo Simulations: In financial and engineering simulations, Cholesky factorization is used to generate random samples from multivariate normal distributions.

  4. Optimization Problems: In optimization, especially quadratic programming, Cholesky factorization is used to solve linear equations and invert matrices efficiently, which is often a computational bottleneck.

Here is a Python code computing Cholesky Decomposition of a matrix.

import numpy as np
from scipy.linalg import cholesky

# Define a positive definite matrix
A = np.array([[4, 12, -16], [12, 37, -43], [-16, -43, 98]])

# Perform Cholesky Decomposition
L = cholesky(A, lower=True)

# Display the results
print("Original Matrix:\n", A)
print("Lower Triangular Matrix (L):\n", L)

# Verify the decomposition
print("Verification (L * \set{L}_T):\n", np.dot(L, L.T))
Original Matrix:
 [[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
Lower Triangular Matrix (L):
 [[ 2.  0.  0.]
 [ 6.  1.  0.]
 [-8.  5.  3.]]
Verification (L * \set{L}_T):
 [[  4.  12. -16.]
 [ 12.  37. -43.]
 [-16. -43.  98.]]

2.2.20.4 Matrix Square Root

The matrix square root of a matrix \(\boldsymbol{A}\) is a matrix \(\boldsymbol{B}\) such that \(\boldsymbol{B}\boldsymbol{B} = \boldsymbol{A}\). This means that when \(\boldsymbol{B}\) is multiplied by itself, the result is the original matrix \(\boldsymbol{A}\). The matrix square root is particularly relevant for positive definite matrices.

Matrix square roots are used in various applications, including:

  1. Control Theory: In control theory, the matrix square root is used in the analysis and design of control systems, particularly in the context of state transition matrices and system stability.

  2. Quantum Mechanics: In quantum mechanics, the square root of a density matrix (a matrix representing a quantum state) is often calculated as part of quantum state tomography and other analyses.

  3. Computer Graphics: In computer graphics, matrix square roots are used in algorithms for transformations and in the manipulation of geometric shapes.

  4. Statistics and Data Analysis: In statistics, the matrix square root (particularly the square root of a covariance matrix) is used in multivariate analysis, including principal component analysis (PCA) and other forms of data reduction.

These applications leverage the matrix square root for its ability to simplify and solve complex mathematical problems, particularly in systems analysis and probabilistic modeling.

To compute the square root of a matrix in Python, you can use the scipy.linalg library, which provides a function for this purpose. Below is a Python code using SciPy to compute the square root of a given matrix.

Here is a Python code computng matrix square root.

import numpy as np
from scipy.linalg import sqrtm

# Define your matrix A (it should be a positive definite matrix)
A = np.array([[4, 2], [2, 3]])

# Compute the square root of the matrix
sqrt_A = sqrtm(A)

# Verify the result
print("Square Root of A:\n", sqrt_A)
print("Verification (Square Root of A multiplied by itself):\n", np.dot(sqrt_A, sqrt_A))
Square Root of A:
 [[1.91936596 0.56216928]
 [0.56216928 1.63828133]]
Verification (Square Root of A multiplied by itself):
 [[4. 2.]
 [2. 3.]]

2.2.20.5 Singular Value Decomposition

Singular Value Decomposition (SVD) is a key matrix decomposition technique in linear algebra, used in many scientific and engineering fields. It decomposes any \(m\times n\) matrix \(\boldsymbol{A}\) into three matrices \(\boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^T\), where

  1. \(\boldsymbol{U}\): An \(m\times m\) unitary matrix. The columns of \(\boldsymbol{U}\), known as the left singular vectors, form an orthonormal basis for the range of \(\boldsymbol{A}\).

  2. \(\boldsymbol{\Sigma}\): An \(m \times n\) rectangular diagonal matrix with non-negative real numbers on the diagonal. The diagonal entries, the singular values of \(\boldsymbol{A}\), are typically arranged in descending order. They represent the lengths of the semi-axes of the ellipsoid described by \(\boldsymbol{A}\). The number of non-zero singular values equals the rank of \(\boldsymbol{A}\).

  3. \(\boldsymbol{V}^T\): An \(n\times n\) unitary matrix. The columns of \(\boldsymbol{V}\), the right singular vectors, form an orthonormal basis for the domain of \(\boldsymbol{A}\).

Key Properties and Uses of SVD are the following:

  • SVD simplifies the analysis of a matrix by breaking it down into simpler constituent parts.

  • It is instrumental in solving overdetermined or underdetermined linear systems, via computing the pseudo-inverse of a matrix.

  • In data science and machine learning, SVD is crucial for algorithms like Principal Component Analysis (PCA), used in dimensionality reduction, data compression, and noise reduction.

  • It plays a significant role in signal processing and statistics for pattern identification and compact representations.

  • SVD is also used in solving linear least squares problems, common in mathematical modeling and scientific computing.

SVD’s versatility and robustness make it an essential tool in computational and applied mathematics.

Here is a Python code computing SVD of a matrix.

import numpy as np

# Define a matrix
A = np.array([[1, 2], [3, 4], [5, 6]])

# Perform Singular Value Decomposition
U, S, Vt = np.linalg.svd(A)

# Display the results
print("Original Matrix:\n", A)
print("Left Singular Vectors (U):\n", U)
print("Singular Values (S):\n", S)
print("Right Singular Vectors Transposed (Vt):\n", Vt)

# Reconstruct the original matrix
Sigma = np.zeros((A.shape[0], A.shape[1]))
Sigma[:A.shape[1], :A.shape[1]] = np.diag(S)
A_reconstructed = U @ Sigma @ Vt

print("Reconstructed Matrix (U * Sigma * Vt):\n", A_reconstructed)
Original Matrix:
 [[1 2]
 [3 4]
 [5 6]]
Left Singular Vectors (U):
 [[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
Singular Values (S):
 [9.52551809 0.51430058]
Right Singular Vectors Transposed (Vt):
 [[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]
Reconstructed Matrix (U * Sigma * Vt):
 [[1. 2.]
 [3. 4.]
 [5. 6.]]

2.2.20.6 Schur Complement

The Schur complement is a useful concept in linear algebra for analyzing and simplifying block matrices. Given a block matrix: \[ \boldsymbol{M} = \begin{bmatrix} \boldsymbol{A} & \boldsymbol{B} \\ \boldsymbol{C} & \boldsymbol{D} \end{bmatrix}, \]

where \(\boldsymbol{A}\), \(\boldsymbol{B}\), \(\boldsymbol{C}\), and \(\boldsymbol{D}\) are submatrices with \(\boldsymbol{A}\) and \(\boldsymbol{D}\) being square, and \(\boldsymbol{A}\) is invertible, the Schur complement of \(\boldsymbol{A}\) in \(\boldsymbol{M}\) is defined as: \[ \boldsymbol{S} = \boldsymbol{D} - \boldsymbol{C} \boldsymbol{A}^{-1} \boldsymbol{B}.\]

Applications of the Schur Complement include

  • Matrix Inversion: Simplifies the inversion of block matrices.
  • Determinant Calculation: Allows expressing the determinant of \(\boldsymbol{M}\) in terms of the determinant of \(\boldsymbol{A}\) and its Schur complement.
  • Solving Linear Systems: Used in solving partitioned systems of linear equations.
  • Control Theory: Applied in stability analysis and controller design.

Here is a Python code computing Schur Complement.

import numpy as np

# Define the block matrix components
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.array([[9, 10], [11, 12]])
D = np.array([[13, 14], [15, 16]])

# Ensure A is invertible
if np.linalg.det(A) == 0:
    raise ValueError("Matrix A is not invertible.")

# Calculate the Schur complement of A in M
# Schur complement S = D - C * A^-1 * B
A_inv = np.linalg.inv(A)
S = D - np.dot(C, np.dot(A_inv, B))

print("Schur Complement of A in M:\n", S)
Schur Complement of A in M:
 [[ 0.00000000e+00 -5.32907052e-15]
 [ 0.00000000e+00 -1.06581410e-14]]

2.2.20.7 Schur Decomposition

Schur Decomposition is a key technique in linear algebra for decomposing square matrices. It represents any square matrix \(\boldsymbol{A}\) as a product of three matrices \[ \boldsymbol{A} = \boldsymbol{Q}\boldsymbol{U}\boldsymbol{Q}^T\]

Where:

  • \(\boldsymbol{Q}\) is a unitary matrix, satisfying \(\boldsymbol{Q}^T\boldsymbol{Q} = \boldsymbol{I}\).
  • \(\boldsymbol{U}\) is an upper triangular matrix.
  • \(\boldsymbol{Q}^T\) is the transpose of \(\boldsymbol{Q}\).

Schur Decomposition is used in various applications in numerical linear algebra, including eigenvalue computations and matrix exponentials.

Here is a Python code computing Schur Decomposition of a matrix.

import numpy as np
from scipy.linalg import schur

# Define a square matrix A
A = np.array([[4, -2, 1],
              [7,  0, 5],
              [6,  3, 2]])

# Perform Schur decomposition
T, U = schur(A)

# Print the original matrix and the Schur decomposition matrices
print("Matrix A:")
print(A)

print("Schur Decomposition - T:")
print(T)

print("Schur Decomposition - U:")
print(U)
Matrix A:
[[ 4 -2  1]
 [ 7  0  5]
 [ 6  3  2]]
Schur Decomposition - T:
[[ 4.39316337  9.00965681 -2.70594944]
 [-0.68611856  4.39316337 -2.9444486 ]
 [ 0.          0.         -2.78632674]]
Schur Decomposition - U:
[[ 0.07068893 -0.99321124  0.09238243]
 [-0.70664211 -0.11522857 -0.69812557]
 [-0.70403125  0.01593157  0.70999027]]

2.2.20.8 Matrix Exponential

The matrix exponential is a significant concept in linear algebra, extending the exponential function to square matrices. For any square matrix \(\boldsymbol{A}\), the matrix exponential is defined by the series: \[ e^{\mathbf{A}} = \sum_{k=0}^{\infty} \frac{1}{k!} \mathbf{A}^k.\]

Note: This is not element-wise exponential.

Matrix exponentials are crucial in solving linear differential equations, control theory, and quantum mechanics, among other applications.

Here is a Python code showing the computation of a matrix exponential.

import numpy as np
from scipy.linalg import expm

# Define a square matrix
A = np.array([[0, -1], [1, 0]])

# Compute the matrix exponential of A
exp_A = expm(A)

# Display the result
print("Matrix A:\n", A)
print("Matrix Exponential of A (exp(A)):\n", exp_A)
Matrix A:
 [[ 0 -1]
 [ 1  0]]
Matrix Exponential of A (exp(A)):
 [[ 0.54030231 -0.84147098]
 [ 0.84147098  0.54030231]]