Skip to main content
Home
Calculating...
Home Linear Systems Least_Squares

Least Squares Calculator: Solve Overdetermined Systems

Find the least squares solution for overdetermined systems Ax ≈ b with more equations than unknowns. Minimizes ||Ax - b||² using the normal equations A^T A x = A^T b.

Calculator

Enter your matrix below and click "Calculate" to see the step-by-step solution.

For least squares, typically m > n (more equations than variables)

A Coefficient Matrix A (m × n)

Each row represents one equation

b Constant Vector b (m × 1)

Value
Computing least squares solution...

Enter an overdetermined system Ax ≈ b to find the least squares solution.

The solution minimizes the squared error ||Ax - b||².

Learn About Least_Squares

Understanding the concepts behind the calculations.


What is the Least Squares Method?

The Least Squares Method finds the "best fit" solution to an overdetermined system—where there are more equations than unknowns. Instead of finding an exact solution (which doesn't exist), it finds the solution that minimizes the sum of squared errors.

Core Problem: For Ax ≈ b with m > n, find 𝐱̂ that minimizes:

$$ \min_{\mathbf{x}} \|A\mathbf{x} - \mathbf{b}\|^2 = \sum_{i=1}^m ((A\mathbf{x})_i - b_i)^2 $$

💡 Real-world analogy: You have 10 data points but only 2 parameters for a line. You can't hit all points exactly, so you find the line that comes "closest" overall.


The Problem: Overdetermined Systems

When you have more equations than unknowns, an exact solution usually doesn't exist. For example:

$$ \begin{cases} x + y = 1 \\\\ x - y = 2 \\\\ 2x + y = 3 \end{cases} $$

Three equations, two unknowns → no exact solution. But we can find the best approximate solution using least squares.

📌 When does this happen?

  • Linear regression with many data points
  • Curve fitting experiments
  • Sensor data processing
  • Computer vision problems

The Normal Equations

Derivation

The least squares solution satisfies the Normal Equations:

$$ \boxed{A^T A \hat{\mathbf{x}} = A^T \mathbf{b}} $$

Where it comes from:

  1. We want to minimize ‖Ax - b‖²
  2. The gradient is 2AᵀAx - 2Aᵀb
  3. Setting gradient = 0 gives AᵀAx = Aᵀb

Unique Solution Condition

If A has full column rank (columns are linearly independent), then AᵀA is invertible and:

$$ \hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b} $$

⚠️ Warning: The normal equations can be numerically unstable for ill-conditioned matrices. For better stability, use QR decomposition


Geometric Interpretation

The least squares solution has a beautiful geometric meaning:

📐 Projection onto Column Space

A𝐱̂ is the orthogonal projection of 𝐛 onto the column space of A.

$$ A\hat{\mathbf{x}} = \text{proj}_{\text{Col}(A)} \mathbf{b} $$

⟂ Orthogonality Condition

The residual 𝐫 = 𝐛 - A𝐱̂ is perpendicular to every column of A:

$$ A^T \mathbf{r} = \mathbf{0} $$

Visual intuition: Imagine a 3D space. The column space is a plane. The vector 𝐛 is somewhere off the plane. The closest point in the plane to 𝐛 is its perpendicular projection. That projection is A𝐱̂.


Complete Example: Linear Regression

Problem: Find the best-fit line y = a + bx through points (1,1), (2,2), (3,2).

Step 1: Set up the system

Each point gives an equation: a + b·x = y

$$ \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} \approx \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix} $$

Step 2: Compute AᵀA

$$ A^T A = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{pmatrix} = \begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix} $$

Step 3: Compute Aᵀb

$$ A^T \mathbf{b} = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix} = \begin{pmatrix} 5 \\ 11 \end{pmatrix} $$

Step 4: Solve the normal equations

$$ \begin{pmatrix} 3 & 6 \\ 6 & 14 \end{pmatrix} \begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 5 \\ 11 \end{pmatrix} $$

Step 5: Find a and b

Using Gaussian elimination or Cramer's Rule:

$$ a = \frac{2}{3} \approx 0.667, \quad b = \frac{1}{2} = 0.5 $$

✅ Result: The least squares line is y = 0.667 + 0.5x

At x=1: predicted 1.167 (actual 1, error -0.167)
At x=2: predicted 1.667 (actual 2, error +0.333)
At x=3: predicted 2.167 (actual 2, error -0.167)
Sum of squared errors = 0.167² + 0.333² + 0.167² ≈ 0.167


Solving Methods

Method When to Use Stability Speed
Normal Equations + Cholesky Well-conditioned problems, small to medium size Moderate Fast
QR Decomposition Recommended for most problems Good Moderate
SVD Ill-conditioned or rank-deficient problems Excellent Slow

💡 Recommendation: For most applications, QR decomposition offers the best balance of speed and stability. Our calculator uses QR decomposition for reliable results.


Real-World Applications

📈 Linear Regression

Finding trends in data: sales forecasts, temperature trends, stock predictions. The foundation of predictive analytics.

📊 Polynomial Fitting

Curve fitting for experimental data, sensor calibration, trajectory smoothing.

🖼️ Computer Vision

Homography estimation for image stitching, camera calibration, 3D reconstruction.

🔧 System Identification

Finding system parameters from input-output measurements in control engineering.

📡 Signal Processing

Noise reduction, signal reconstruction, adaptive filtering.

🧬 Bioinformatics

Gene expression analysis, protein structure prediction.


Limitations & Extensions

⚠️ Limitations of Standard Least Squares

  • Outlier sensitivity: A single bad data point can skew results significantly
  • Linearity assumption: Only fits models linear in parameters
  • Collinearity issues: Nearly dependent columns cause instability
  • Equal variance assumption: All measurements treated equally

🚀 Extensions for Better Results

Weighted Least Squares
Give less weight to unreliable measurements.

Ridge Regression
Adds penalty to prevent overfitting.

LASSO
Automatic feature selection.

Total Least Squares
Accounts for errors in both variables.

📐 Before using least squares: Check that your matrix has linearly independent columns. If not, the solution isn't unique!


How Good Is Your Fit?

R² (Coefficient of Determination)

$$ R^2 = 1 - \frac{\|\mathbf{r}\|^2}{\|\mathbf{b} - \bar{b}\mathbf{1}\|^2} $$
  • R² = 1: Perfect fit (all residuals zero)
  • R² near 1: Excellent fit
  • R² near 0: Model explains little variance
  • R² negative: Model worse than using mean

Residual Analysis

The residuals 𝐫 = 𝐛 - A𝐱̂ should be:

  • Random (no pattern)
  • Normally distributed
  • Constant variance (homoscedastic)


Summary

Key Takeaways

  • Purpose: Find best approximate solution when no exact solution exists
  • Formula: 𝐱̂ = (AᵀA)⁻¹Aᵀb (for full column rank)
  • Normal Equations: AᵀA𝐱 = Aᵀb
  • Geometric: Projects b onto column space of A
  • Best for: Linear regression, curve fitting, overdetermined systems

💡 Pro Tip: For best numerical stability, use QR decomposition instead of solving normal equations directly, especially for ill-conditioned problems.


Try It Yourself!

Use the calculator above to solve least squares problems:

  1. Enter your matrix A (more rows than columns)
  2. Enter your vector b (right-hand side)
  3. Click "Calculate" to see:
    • The least squares solution 𝐱̂
    • The normal equations
    • Residuals and error norm
    • Step-by-step computation

📐 Try these examples:

  • Linear regression: Points (1,2), (2,3), (3,5), (4,6), (5,7)
  • Quadratic fit: Points (-2,4), (-1,1), (0,0), (1,1), (2,4)
  • Overdetermined system: 3 equations, 2 unknowns from the example above

📊 R² interpretation: After calculating, check the R² value. Values above 0.9 indicate excellent fit, below 0.5 suggests poor model fit.