Conducting multiple regression analysis requires a fair amount of pre-work before actually running the regression:
1. Generate a list of potential variables, independents and dependent;
2. Collect data on the variables
3. Check the relationships between each independent variable and dependent variable using scatterplots and correlations
4. Check the relationships among the independent variables using scatterplots and correlations.
5. (Optional) Conduct simple linear regressions for each IV/DV pair
6. Use the non-redundant independent variables in the analysis to find the best fitting model.
7. Use the best fitting model to make predictions about the dependent variable.
One way to measure multicollinearity is the variance inflation factor (VIF) , which assesses how much the variance of an estimated regression coefficient increases if your predictors are correlated. If no factors are correlated, the VIFs will all be 1. A VIF between 5 and 10 indicates high correlation that may be problematic. And if the VIF goes above 10, you can assume that the regression coefficients are poorly estimated due to multicollinearity.
Examine the difference between R-Sq(adj) and R-Sq(pred). A large drop-off indicates overfitting--too many variables in the model.


没有评论:
发表评论