Least Squares Regression - How to fit a curve

Engineers and Scientists often need to carry out analysis of technical or experimental numerical data. This is to find a relationship between the dependent and independent variables. These are frequently time periods or temperature change, and dependent variables including fluid velocity or displacement. A piece of machinery rotating at a constant acceleration has a linear relationship. By using linear regression or interpolation, the equation of the straight line will demonstrate this relationship.

Equation of a straight line

Equation of straight line - Linear Regression
Image by The Enviro Engineer - Equation of straight line

The straight line relationship will determine the slope and intercept on the Y axis. The equation of the straight line then determines the unknown values. It can find unknown values within the data range and also extrapolate values outwith the data range.

When the relationship is precise, such as machinery rotating at a constant acceleration, interpolation may provide the most accurate results. Regression is a more useful mathematical method for finding the relationship for dependent scattered variables that are not precise. The line of best fit or trend line attempts to produce a fit closest to points within the experimental data.

These data variables often don’t fit within a straight line and can fit better to a curve. Polynomial, Lagrange, or Spline interpolation can sometimes achieve the best fit to the data. Alternatively, to obtain the best fit requires the use of non-linear regression methods. There are different types of non-linear regression curves, including logarithmic, exponential, power, polynomial, and saturation growth curves. To find the best fit, it may be necessary to try different methods to determine the closest fit. The coefficient of determination R² frequently determines how close the predictions are to real data points. However, to obtain a perfect fit to real data points, the R² value must be equal to 1.

Statistical Software for Regression Analysis

Software such as IBM’s Statistical Package for Social Sciences* (SPSS) or Microsoft’s Excel* is for linear and non-linear analysis. Excel includes within its chart functions various trend line options. These include exponential, linear, logarithmic, polynomial, power and moving average trend lines. Most of these trend line options allow the ability to display the equation and R² value within the chart.

There are many methods of finding the best fit to the data, but one method is least squares regression. I have created eight SMath Studio* calculation sheets. These provide examples of how to obtain the best fit using the least squares methods. Please be aware I was an engineer, not a mathematician. I therefore recommend anyone wanting further information on numerical data analysis should visit Wolfram MathWorld or other similar websites.

I have provided links on this page to commercial software packages for the benefit of visitors to my website. This is not an endorsement of these mathematical and statistical software packages.

Regression Analysis Calculation Sheets

Interpolation – The process of constructing a mathematical data point within a range of known data points

Linear – The process of producing a mathematical function that provides the best fit to a series of experimental data points.

Exponential Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points.

Logarithmic Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points.

Polynomial Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points. The equation of the polynomial curve

Polynomial Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points. The equation of the polynomial curve

Power Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points.

Saturation Growth Curve – The process of producing a mathematical function that provides the best fit to a series of experimental data points.