Math

Linear Regression Calculator

Find the best-fit line and R² for any dataset

One point per line. Separate x and y with a comma or space.

Regression equation

y = 0.9515x + 1.2667

Slope (m)

0.9515

Intercept (b)

1.2667

0.9515

Pearson r

0.9755

Data points

10

Mean x

5.5000

Mean y

6.5000

Std error

0.6898

R² = 95.15%Very strong fit — the model explains most variation

Predict y for a given x

View data table (10 points)
#xyŷ (predicted)residual
1122.2182-0.2182
2243.16970.8303
3354.12120.8788
4445.0727-1.0727
5556.0242-1.0242
6676.97580.0242
7787.92730.0727
8898.87880.1212
99109.83030.1697
10101110.78180.2182
## Linear Regression Calculator — Line of Best Fit Linear regression is the most widely used statistical modeling technique in science, economics, engineering, and social research. It finds the equation of the straight line that best describes the relationship between two variables, allowing both analysis and prediction. ### The Least Squares Method The regression line y = mx + b is found by minimising the **sum of squared residuals** (SSR): $$SSR = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$ The formulas for slope and intercept are: $$m = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}$$ $$b = \bar{y} - m\bar{x}$$ ### Key Statistics Explained **Slope (m)** — The change in y for every 1-unit increase in x. A positive slope means x and y increase together; negative means they move in opposite directions. **Intercept (b)** — The predicted value of y when x = 0. Only meaningful if x = 0 is within or close to the data range. **R² (coefficient of determination)** — The proportion of total variance in y explained by the model. R² = r² where r is the Pearson correlation coefficient. **Standard Error of Estimate** — The average distance the actual y values fall from the regression line. Smaller is better. ### How to Use Regression for Prediction Once you have y = mx + b, plug in any x value to predict y. Enter any x value into the prediction input below the regression results. **Important**: Only predict within (or close to) the range of your training data. Extrapolating far beyond the data range is unreliable. ### Real-World Examples - **Economics**: Predict GDP growth from interest rate changes - **Medicine**: Estimate blood pressure from age - **Real estate**: Predict house price from square footage - **Marketing**: Estimate sales from advertising spend - **Physics**: Find acceleration from force-mass data

Frequently Asked Questions

What is linear regression?
Linear regression finds the best-fitting straight line through a set of data points using the method of least squares. The resulting equation y = mx + b minimises the sum of squared vertical distances (residuals) between each point and the line. The slope m describes the rate of change and the intercept b gives the y-value when x = 0.
What does R² (R-squared) tell us?
R² measures how well the regression line fits the data. It ranges from 0 to 1: R² = 1 means the line perfectly predicts every point; R² = 0 means the line is no better than predicting the mean of y. An R² of 0.85 means 85% of the variance in y is explained by x.
What is the difference between correlation and regression?
Correlation (Pearson r) measures the strength and direction of the linear relationship between two variables — it has no direction. Regression goes further: it quantifies exactly how much y changes per unit change in x, and it can be used to make predictions. Regression also reveals R² (=r²), which tells you how much variance is explained.
How do I interpret the slope?
The slope m tells you how many units y changes for each 1-unit increase in x. For example, if m = 2.5, then for every additional unit of x, y increases by 2.5. A negative slope means y decreases as x increases.
What is a residual in regression?
A residual is the difference between the observed y value and the predicted y value at a given x: residual = y − ŷ. Small residuals mean the model fits well at that point. Large residuals indicate outliers. The regression line minimises the sum of all squared residuals.
When should I not use linear regression?
Linear regression assumes a straight-line relationship. If the scatter plot shows a curve, exponential growth, or a pattern, linear regression will give poor predictions. Also avoid it if there are influential outliers that distort the slope, or if the residuals show a systematic non-random pattern.