Solving Least Squares Problems
Introduction
we are back at it with Linear Algebra,In of data science, engineering, and applied mathematics,least squares problems are fundamental. Whether you’re fitting a regression model, processing signals, or solving systems of linear equations, the least squares method often arises when an exact solution is impossible or impractical.
The linear least squares problem seeks the
Where:
is an matrix of coefficients, is an observation vector, is the vector of unknowns we wish to find.
Depending on the relationship between
Least squares has three standard methods of solving the problem
- Normal Equations
- QR decomposition
- Singular value decomposition.
- Transformation to a linear system
There is no single best method to solve it in practice the method you choose depends on the size and shape of the matrix, performance requirements and so on.This is the reason i ended up coming up with this write up, we were trying to determine which method should be the default that gets called whenever a request is made.
1.Normal Equations
This is the classical method for solving least squares problems introduced by Gauss.It solves the problem by transforming it into the form:
If the matrix A has full column rank (its columns are linearly independent),then in such a case,
This sequence of transformations can be thought of as:
Stability Considerations
While the normal equations method is exact in theory, in practice it can suffer from numerical instability. This is because when forming
Moreover, the condition number of
This squaring effect can significantly worsen stability, making the normal equations unsuitable for poorly conditioned matrices.
1
2
3
4
5
6
7
8
9
10
11
let normalEquationsCholeskySafe (A: Matrix<float>) (b: Vector<float>) =
let At = A.Transpose()
let AtA = At * A
let Atb = At * b
if isPositiveDefinite AtA then
let cholesky = AtA.Cholesky()
cholesky.Solve Atb
else
printfn "Matrix is NOT positive definite."
AtA.Solve Atb
2.QR Decomposition
QR decomposition is a factorization method for an
is an orthogonal matrix ( ) is an upper triangular matrix.
QR decomposition simplifies solving linear least squares problems mainly because it transforms the problem into solving a triangular system.
In simple terms, Householder reflectors use reflection to zero out lower elements of a column. Conceptually, you transform a column vector
Construction of QR Decompostion
The core idea in QR revolves around orthogonalizing the columns of
Rather than diving into full derivations of the Householder trnaformation (which is widely used in practice given its stability), you can refer to this detailed lecture note for the mathematics.
In simple terms, Householder reflectors use reflection to zero out lower elements of a column. Conceptually, you transform a column vector
| Where $v = a_k \pm | a_k | _2 e_1 |
Applying this reflectors sequentially yields
An implementation in F# would look like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
let householderQR (A: Matrix<float>) =
let m = A.RowCount
let n = A.ColumnCount
let tau = DenseVector.zero n
let mutable A = A.Clone()
for j in 0 .. n - 1 do
let x = A.SubMatrix(j, m - j, j, 1).Column(0)
let normx = x.L2Norm()
let s = if x.[0] >= 0.0 then -1.0 else 1.0
let u1 = x.[0] - s * normx
let w = x / u1
w.[0] <- 1.0
for i in 1 .. w.Count - 1 do
A.[j + i, j] <- w.[i]
A.[j, j] <- s * normx
tau.[j] <- -s * u1 / normx
//reflection to the matrix remaing bits.
let subA = A.SubMatrix(j, m - j, j + 1, n - j - 1)
let wT_A = w.ToRowMatrix() * subA
let update = tau.[j] * w.ToColumnMatrix() * wT_A
A.SetSubMatrix(j, m - j, j + 1, n - j - 1, subA - update)
A, tau
Least Squares Using QR
QR factorization preserves the Euclidean norm, making it numerically stable. Once
Since
Which leads to solving the triangular system:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
//we apply the Q^T to vector b
let applyQTransposeToB (A: Matrix<float>) (tau: Vector<float>) (b: Vector<float>) : Vector<float> =
let m = A.RowCount
let n = tau.Count
let bCopy = b.Clone()
for j in 0 .. n - 1 do
let w = DenseVector.init (m - j) (fun i ->
if i = 0 then 1.0 else A.[j + i, j]
)
let dot = w.DotProduct(bCopy.SubVector(j, m - j))
let scale = tau.[j] * dot
for i in 0 .. m - j - 1 do
bCopy.[j + i] <- bCopy.[j + i] - scale * w.[i]
bCopy
This can be efficiently solved using back substitution.
1
2
3
4
5
6
7
8
9
10
//finally we back substitute to solve Rx = Q^T b
let backSubstitute (R: Matrix<float>) (b: Vector<float>) : Vector<float> =
let n = R.ColumnCount
let x = DenseVector.zero n
for i in [n - 1 .. -1 .. 0] do
let sum =
[i + 1 .. n - 1]
|> List.sumBy (fun j -> R.[i, j] * x.[j])
x.[i] <- (b.[i] - sum) / R.[i, i]
x
3. Singular value Decomposition.
SVD decomposition does really originate from the singular values and Eigen matrices concept.Signular values decompostion has the form:
3.1 Geometric Interpretation
SVD can be understood geometrically as:
- Rotation/Reflection by
— reorients the coordinate system in the domain. - Scaling by
— stretches or compresses along the new coordinate axes. - Rotation/Reflection by
— reorients the result in the range.
The range space and null space are directly revealed by
- Columns of
corresponding to zero singular values form an orthonormal basis for the null space of . - Remaining columns of
span the orthogonal complement of the null space. - Columns of
corresponding to nonzero singular values form an orthonormal basis for the range of . - Remaining columns of
span the orthogonal complement of the range.
Ordinary Least Squares
For
In ordinary least squares, we assume
In Total Least Squares (TLS), both
This is solved by computing:
If
Here is an F# implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
let solveLeastSquaresSVD (A: Matrix<float>) (b: Vector<float>) : Vector<float> =
let svd = A.Svd true
let s = svd.S.ToArray()
let tolerance = 1e-10 * float (max A.RowCount A.ColumnCount) * s.[0]
let rank = s |> Array.filter (fun x -> x > tolerance) |> Array.length
let sigmaInv = DenseMatrix.init rank rank (fun i j ->
if i = j && s.[i] > tolerance then 1.0 / s.[i] else 0.0)
let U = svd.U.SubMatrix(0, A.RowCount, 0, rank)
let Vt = svd.VT.SubMatrix(0, rank, 0, A.ColumnCount)
let pseudoInverse = Vt.Transpose() * sigmaInv * U.Transpose()
pseudoInverse * b
Conclusions
We run a benchmark test to compare the performance of all this methods built above by simulating and getting the average time taken for each method.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
let benchmark name n (f: unit -> Vector<float>) =
let sw = Stopwatch()
let times = ResizeArray<float>()
let mutable last = DenseVector.zero 1
for _ in 1 .. n do
sw.Restart()
last <- f()
sw.Stop()
times.Add(float sw.Elapsed.TotalMilliseconds)
let avgTime = Seq.average times
printfn "%s ran %d times. Avg time: %.3f ms" name n avgTime
name, times |> List.ofSeq
and finally utilized the plotly.Net library for visualization.
1
2
3
4
5
6
7
8
9
10
11
12
let chart =
results
|> List.map (fun (name, times) ->
let indexedTimes = times |> List.mapi (fun i t -> i, t)
Chart.Line(indexedTimes, Name = name)
)
|> Chart.combine
|> Chart.withTitle "Solver Benchmark Times (ms)"
|> Chart.withXAxisStyle (TitleText = "Run Index")
|> Chart.withYAxisStyle (TitleText = "Time per Run (ms)")
chart |> Chart.show
The Full code can be found here
On running the above solvers, a clear trade-off emerges between computational efficiency and numerical stability. The normal equations with Cholesky factorization are by far the fastest, but their reliability strongly depends on the conditioning of A
A.When
The QR decomposition provides a more balanced compromise.Our custom Householder implementation validates the theoretical steps of the algorithm, while Math.NET’s optimized QR consistently outperforms it in practice. QR is thus the most practically robust solver for large well posed problems, combining stability with speed.
The singular value decomposition (SVD) stands apart as the most reliable method, as it explicitly handles rank deficiency and ill-conditioning. Its downside is computational cost, which our benchmarks clearly reveal. Nevertheless, SVD is indispensable when accuracy is paramount, or when the system is ill-conditioned. This is especially evident in the context of Total Least Squares, where the smallest singular value encodes the perturbation to both A A and b, yielding the corrected TLS solution.
There numerous texts that have explored these various methods for solving linear least squares in case you want to diver deeper into it. In practice, one must therefore match the solver to the problem:
- Cholesky for speed when A is well conditioned,
-
QR as the general purpose workhorse,
- SVD for ill-conditioned or rank-deficient systems, and
- TLS when modeling noise in both A and b.