Andrew Abok

The Levenberg-Marquardt algorithm

Non linear Least Squares

Nonlinear least squares regression extends linear least squares regression for use with a much larger and more general class of functions.

A nonlinear model is any model of the form y=f(x;β)+ϵ our y in this case could take the form y=β0Xβ1+X

Consider a set of m possibly non linear equations in n variables x=(x1,xn) which can be written as fi(x)=0,i=1,..,m. for any x we call fi(x) which is the ith residual.

we can seek an approximate solution by finding x that minimizes the sum of squares of the residuals

f1(x)2+..+fm(x)2=||f(x)||2also||f(x)||2||fx^||2

simply put this translates to

minimize||f(x)||2

the optimal condition to be met is the partial derivatives of f(x) with respect to each x1,.,xn must vanish at x^

Gauss–Newton Method

The Gauss-Newton method is a modification of Newton’s method (a root-finding algorithm). Instead of using the second derivative (Hessian matrix), it uses a line search for the search direction.

In Newton’s method, the step size update requires calculating the full Hessian matrix H, which involves second-order derivatives of the total error function. This can be complex. The Gauss-Newton method simplifies this by using the Jacobian matrix to approximate the Hessian.

Mathematical Description

Sum of Squared Errors: E(x,w)=12m=1Mem2 Where:

  • ( x ) is the input vector
  • ( w ) is the weight vector
  • ( e ) is the error for the m-th data point defined as: em=ymf(xm,w)

Gradient of the Error Function: The gradient ( g ) is defined as: g=E(x,w)w

Integrating the error function and the gradient function gives us: gi=Ewi=(12m=1Mem2)wi

The relationship between the gradient vector and the Jacobian matrix is: g=Je

Hessian Matrix: The Hessian matrix ( H ) is given by: hi,j=2Ewiwj=2(12m=1Mem2)wiwj

The relationship between the Hessian and the Jacobian matrix is approximated as: HJTJ

Gauss-Newton Algorithm: The update rule for the Gauss-Newton algorithm is: wk+1=wk(JkTJk)1Jkek

Pseudo code

  1. Initialize weights w0.
  2. Compute the Jacobian matrix J and the error vector e.
  3. Update the weights using: wk+1=wk(JTJ)1JTe.
  4. Repeat until convergence.

Implementation in F#

For this as well we will do the implementation in F#.

1
2
3
4
5
6
7
8
9
10
11
let gaussNewton (r: Vector<float> -> Vector<float>) (J: Vector<float> -> Matrix<float>) (beta0: Vector<float>) (maxIter: int) (tol: float) =    
    let rec iterate beta iter =
        if iter > maxIter then beta 
        else
            let Jbeta = J beta
            let rBeta = r beta
            let delta = -(Jbeta.Transpose() * Jbeta).Inverse() * (Jbeta.Transpose() * rBeta)
            let newBeta = beta + delta
            if delta.L2Norm() < tol then newBeta 
            else iterate newBeta (iter + 1) 
    iterate beta0 0

The Levenberge-Marquardt Algorithm.

If you take a closer look at the Levenberge-Marquardt method, it combines the advantages of gradient decent (which is stability) and gauss-Newton methods(which is speed). The Levenberge-Marquardt instead of using line search like the Gauss-Newton methods it instead uses a trust region strategy.

Without going further into the details using trust region avoids one of the weaknesses of Gauss–Newton, which is its behavior when the Jacobian J(x) is rank-deficient.How the Levenberg-Marquardt algorithm avoids this is in how the hessian matrix JTJ is approximated,it introduces a different approximation unlike the gauss newton method. HJTJ+μIWhere:μ is positive, combination coefficientI is the Identity Matrix

this makes the update rule for the Levenberg-Marquardt to be defined as :

wk+1=wk(JkTJk+μI)1Jkek

Pseudo Code

  1. Initialize weights w0, damping parameter μ>0, and tolerance ϵ.
  2. Repeat until convergence:
    • Step 1: Compute the Jacobian matrix Jk and the error vector ek at wk.
    • Step 2: Compute the approximate Hessian matrix:
      H=JkTJk+μI
    • Step 3: Compute the update direction:
      Δwk=H1JkTek
    • Step 4: Update the weights:
      wk+1=wk+Δwk
    • Step 5: Evaluate the error function at the new weights wk+1:
      • If the error decreases:
        • Accept the new weights wk+1.
        • Decrease μ (e.g., μ=μ/β),where(β>1.
      • Else:
        • Reject wk+1 and increase μ (e.g.,μ=μβ.
  3. Check for convergence:
    • Stop if |Δwk|<ϵ or the error does not improve significantly.

Implementation in F#

We can proceed to do a basic implementation with LU decomposition.For the purpose of understanding.I’d recommend using the Fsharp.stats library for non linear fitting given its robustness.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
let levenbergMarquardt jacobian  func (xData: float[][]) (yData: float[]) (initialParameters: float[]) (maxIter: int) (lambda: float) =
    let mutable parameters = initialParameters
    let mutable lambda = lambda
    let nParams = initialParameters.Length

    for iter in 1 .. maxIter do
        let J = Array.init xData.Length (fun i -> jacobian xData.[i] parameters)
        let r = Array.init xData.Length (fun i -> yData.[i] - func xData.[i] parameters)

        let JT = Array.transpose J
        let JTJ = Array.init nParams (fun i -> 
            Array.init nParams (fun j -> Array.sumBy (fun k -> JT.[i].[k] * J.[k].[j]) [| 0 .. J.Length - 1 |]))
        let JTr = Array.init nParams (fun i -> Array.sumBy (fun k -> JT.[i].[k] * r.[k]) [| 0 .. J.Length - 1 |])

        //we Modify the hessian with levenberg regularization
        let H = Array.init nParams (fun i -> 
            Array.init nParams (fun j -> if i = j then JTJ.[i].[j] + lambda else JTJ.[i].[j]))

        //solve ln eq with LU decomposition
        let deltaW = solveWithLU H JTr

        parameters <- Array.map2 (+) parameters deltaW

        //lambda tuning
        let newError = Array.sumBy (fun i -> (yData.[i] - func xData.[i] parameters) ** 2.0) [| 0 .. xData.Length - 1 |]
        let oldError = Array.sumBy (fun i -> (yData.[i] - func xData.[i] (Array.map2 (-) parameters deltaW)) ** 2.0) [| 0 .. xData.Length - 1 |]

        if newError < oldError then
            lambda <- lambda / 10.0
        else
            lambda <- lambda * 10.0

    parameters

To verify that our implementation works, we fit a quadratic function to sample data points.

1
2
3
4
5
6
7
8
9
10
11
12
13
let modelObjfunc (x: float[]) (parameters: float[]) : float =   
    parameters.[0] * x.[0] ** 2.0 + parameters.[1] * x.[0] + parameters.[2]

let jacobian (x: float[]) (parameters: float[]) =
    [| x.[0] ** 2.0; x.[0]; 1.0 |]  

let xData = [| [| 1.0 |]; [| 2.0 |]; [| 3.0 |]; [| 4.0 |] |]
let yData = [| 2.2; 2.8; 3.6; 4.5 |]
let initialParameters = [| 0.5; 0.5; 0.5 |]
let maxIterations = 500
let lambda = 0.01

let optimizedParams = levenbergMarquardt jacobian  modelObjfunc xData yData initialParameters maxIterations lambda

The results from the above are: ` [|0.075; 0.395; 1.725|]` After running the optimization, we can plot the fitted curve using Plotly.NET

Conclusions.

The Levenberg-Marquardt algorithm is best applied in curve fitting. It blends the best of gradient descent (which is great for slow, steady learning) and Gauss-Newton (which is faster but less stable). By adapting its approach based on how close it is to the best solution, it improves accuracy and speed. However, like many optimization techniques, it usually finds the closest good solution (local minimum) rather than the absolute best one (global minimum).Another challenge is when the residuals are large in converges slower or fails to converge at all.

References

The Levenberg-Marquardt Method and its Implementation in Python