top of page

# Multiple Regression

## How the Multiple Regression Model Works

A relationship between one dependent variable and one independent variable (simple linear regression) can be represented by a single line.

A relationship between one dependent variable and more than one independent variable can be represented by multiple lines - which create a plane that can be used as a multiple regression (or multivariate) model.

The equation for the plane can be used to predict the dependent variable based on known values for the independent variables.

A plane has at least three points which are not colinear (not related).

A plane contains at least three sets of possible outcomes.

Those outcomes are contained within:

Sets of points located on lines representing the relationships between each of the independent variables (at least two) and the dependent variable.

And, a set of points outside of those lines (one) representing all other possible outcomes (aka error).

We assume that those sets of points (relationships) between the independent variables are not related (not co-linear), for the sake of the test. In fact, if they are related, that is a problem (known as multi-collinearity) which may skew our results.

The plane (model) represents the only set of points that intersects each of the lines simultaneously. This is how the model is used to understand the simultaneous changes occurring between the independent variables.

For example - how the independent variables, Attempts, Passes, and Rebounds, simultaneously effect the dependent variable, Fantasy Points Per Game.

## Multiple Regression Equation

In the equation for a multiple regression plane is similar to that of a simple regression line. We simply add in the additional variables and their coefficients (slopes).

For example, we are interested in how Fantasy Basketball Points Per Game are effected by Attempts, Passes, and Rebounds.

We can use our data to perform a multiple regression analysis. Our results will provide coefficients (similar to weights), that tell us the slope of each of the lines for our independent variables. These weights are multiplied by the value of each independent variables, and added together to find the predicted value of our dependent variable, Points Per Game.

The coefficients (represented by "B" aka Beta) ultimately tells us how much a one-unit change in each independent variable, (Attempts, Passes, and Rebounds) increases or decreases the independent variable, Fantasy Basketball Points Per Game:

FPPP = B0 + (B1 * Attempts) + (B2 * Passes) + (B3 * Rebounds)

B0 aka the intercept represents the value of the dependent variable, FPPP, when all of the independent variables (Attempts, Passes, and Rebounds) are equal to zero.

B1, B2, and B3 represent the coefficients or weights for each of the independent variables.

The actual data points from our sample lie above or beyond the plane (set of predicted values). (Similar to how errors lie above and below a simple regression line.) The differences between the actual points and the points on the predicted plane are errors aka residuals.

Our goal is to minimize the errors in our model.

Errors can results from missing information (additional variables that should be included in the model). Or problems with the statistical analysis - for example in data collection, with bias, typos, etc.

Our measurement for the explanatory power of our model (R-squared) will actually only go up - no matter what or how many variables we add to it. The idea is that the more information we have, the more powerful our analysis is.

Even if we add an independent variable to the model that has almost no relationship with the dependent variable will still increase it's explanatory power, incrementally.

However, unnecessary variables that do not meaningfully contribute to the model may increase multicollinearity and cause unwarranted confusion in your analysis.

More on How to perform a Multi-Variate Analysis, Significance, and Conclusions to come!